This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project
to make the world's books discoverable online.
It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject
to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books
are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover.
Marks, notations and other marginalia present in the original volume will appear in this file - a reminder of this book's long journey from the
publisher to a library and finally to you.
Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to
prevent abuse by commercial parties, including placing technical restrictions on automated querying.
We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for
personal, non-commercial purposes.
+ Refrain from automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the
use of public domain materials for these purposes and may be able to help.
+ Maintain attribution The Google "watermark" you see on each file is essential for informing people about this project and helping them find
additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner
anywhere in the world. Copyright infringement liability can be quite severe.
About Google Book Search
Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web
at |http : //books . google . com/
V^^^h lOOiD.V-.
5
\
SCIENCE CENTER LIBRARY
rgitized by
Googl:
Digitized by
Google
Digitized by
Google
THE MATHEMATICAL THEORY OF
PROBABILITIES
Digitized by
Google
fM^
THE MACMILLAN COMPANY
NEW YORK BOSTON GHIGAOO
DALLAS SAN FRANCISCO
MACMILLAN & CO., Limited
LONDON BOMBAY CALCUTTA
MELBOURNE
THE MACMILLAN CO. OF CANADA, Lm.
TORONTO
Digitized by
Google
THE MATHEMATICAL THEORY
OF
PROBABILITIES
AND
ITS APPLICATION TO FREQUENCY CURVES AND
STATISTICAL METHODS .
BY
ARNE FISHER, F.S.S.
(LONDON)
TRANSLATED AND EDITED
FBOM THE author's ORIGINAL DANISH NOTES
WITH THE ASSISTANCE
WILLIAM BONYNGE, B.A.
(BELFAST)
WITH AN INTRODUCTORY NOTE
F. W. FRANKLAND, F.I.A., F.A.S., F.S.S.
■XAlCOnra IN STATISTICAI« BCETHOD AND IN PTTBll MATHEMATICB TO '
OOVKBNMBNT OF NEW ZEALAND
VOLUME I.
Mathematical Probabilities and Homogbade
Statistics
i:^ York
THE MACMILLAN COMPANY
1915
Digitized by
Google
/)/.r:^L l:ch}6
Copyright, 1916
By Abnb Fishbb
Bet up and eleotrotyped. Published NoTember, 1915
PRESS OP
W ERA PRINTING COMPANY
LANCASTER« PA.
Digitized by
Google
DEDICATED BT PEBMISSION TO
JOHN BODINE LUNGER, ESQ.
VIC1S-PBB8IDBMT OF
CTX KQXTITABLB UFB A88UIUNCB SOdSTr
OF THB UNITBD STATES«
AND ONB OF TEOi OBOANIZEBS OF THB AGTUABIAIi SOdSTY OV AMBBIOA
AB A TOKEN OF ESTEEM FOB THE GREAT INTEBEST WHICH HE, SO
MARKEDLY AMONG THE DISTINGUIBHED LIFE INSURANCE EXEC-
UTIVES OF AMERICA, HAS ALWAYS SHOWN IN WHATEVER
HAS TENDED TO PROMOTE THE STUDY AND DEVELOP-
MENT OF ACTUARIAL AND STATISTICAL 8CIENCB
Digitized by
Google
Digitized by
Google
INTRODUCTORY NOTE.
I feel it a great honor to have been asked by my friend and
colleague, Mr. Arne Fisher, of the Equitable Life Assurance
Society of the United States, to write an introductory note to
what appears to me the finest book as yet compiled in the English
language on the subject of which it treats. As an Examiner
myself in Statistical Method for a British Colonial Government,
it has been to me a heart-breaking experience, when implored by
intending candidates for examination to recommend a text-book
dealing with Mr. Fisher's subject matter, that it has heretofore
been impossible for me to recommend one in the English language
which covers the whole of the ground. Until comparatively
recent years the case was even worse. While in French, in ItaHan,
in German, in Danish, and in Dutch, scientific works on statistics
were available galore, the dearth of such literature in the English
language was little short of a national or racial scandal. With
such works as those of Yule and Bowley, in recent years, there
has been some possibility for the English-speaking student to
acquire part of the knowledge needed. But it is hardly necessary
to point out what a very large amount of new ground is covered
by Mr. Fisher's new book as compared with such works as I have
referred to.
Despite my professional connection with statistical and actu-
arial work of a technical character my own personal interest in
Mr. Fisher's book is concentrated principally on the metaphysical
basis of the Probability-theory, and it is with regard to this
aspect of the subject alone that I feel qualified to comment on his
achievement. With all the controversy that has gone on through
many decades among metaphysicians and among writers on logic
interested especially in the bases of the theories of probability and
induction, between the pure empiricists of the type of J. S. Mill
and John Venn (at all events in the earliest edition of his work)
on the one hand, and the (partly) a priori theorists who base their
doctrine on the foundation of Laplace on the other hand, it has
'ill
Digitized by
Google
VIII INTRODUCTORY NOTE.
been a source of intense satisfaction to me, as in the main a dis-
ciple of the latter group of theorists, to note the masterly way in
which Mr. Arne Fisher disentangles the issues which arise in the
keen and sometimes almost embittered controversy between these
two schools of thought. It has always seemed to the present
writer as if the very foundations of Epistemology were involved
in this controversy. The impossibility of deriving the corpus of
human knowledge exclusively from empirical data by any logic-
ally valid process — an impossibility which led Immanuel Kant
to the creation of his epoch-making philosophical system — ^is
hardly anywhere made more evident than in what seems to the
present writer the unsuccessful effort of thinkers like John Venn
to derive from such purely empirical data the entire Theory of
Probability. The logical fallacy of the process is analogous to
that perpetrated by John Stuart Mill in endeavoring to base the
Law of Causality on what he termed an "indiictio per simplicem
enuTmratUmem'^ Probably there is nowhere a more trenchant
and conclusive exposure of the unsoundness of this point of view,
than in the Right Honorable Arthur James Balfour's monu-
mental work " A Defense of Philosophic Doubt." It is there-
fore satisfactory to find that Mr. Fisher emphasizes, quite at the
beginning of his treatise, that an h priori foimdation for "Proba-
bility" judgments is indispensable.
Hardly less gratifying, from the metaphysical point of view,
is Mr. Fisher's treatment of the celebrated quaestio vexaia of
Inverse Probabilities and his qualified vindication of Bayes'
Rule against its modern detractors.
Aside altogether from metaphysics, it is particularly satis-
factory to note the full and clear way in which the author treats
the Lexian Theory of Dispersion and of the "Stability" of sta-
tistical series and the extension of this theory by recent Scandi-
navian and Russian investigators, — ^a branch of the science which
has till the appearance of this new work not been adequately
covered in English text-books.
It may of course be a moot question whether the preference
given by our author to Charlier's method of treating "Frequency
Curves'^ over the method of Professor Karl Pearson is well
advised. But whatever the experts' verdict may be on debatable
Digitized by
Google
INTRODUCTORY NOTE. IX
questions like these, the scientific world is to be congratulated on
Mr. Fisher's presentment of a new and sound point of view, and
he emphatically is to be congratulated on the production of a
te3rt-book which for many years to come will be invaluable both
to students and to his confreres who are engaged in extending
the boundaries of this fascinating science.
F. W. Frankland,
Member of the Actuarial Society of America,
Fellow of the InstitiUe of Actuaries of Great
Britain and Ireland, and Fellow of the
Royal Statistical Society of London.
New York,
October 1, 1915.
Digitized by
Google
Digitized by
Google
PREFACE.
" Probability " has long ago ceased to be a mere theory of games
of chance and is everywhere, especially on the continent, regarded
as one of the most important branches of applied mathematics.
This is proven by the increasing number of standard text-books in
French, German, Italian, Scandinavian and Russian which have
appeared during the last ten years. During this time the research
work in the theory of probabilities has received a new impetus
through the labors of the English biometricians under the leader-
ship of Pearson, the Scandinavian statisticians Westergaard,
Charlier and Kiær, the German statistical school under Lexis, and
the brilliant investigations of the Russian school of statisticians.
Each group of these investigations seems, however, to have
moved along its own particular lines. The English schools have
mostly limited their investigations to the field of biology as pub-
lished in the extensive memoirs in the highly specialized journal,
Biometrika. The Scandinavian scholars have produced researches
of a more general character, but most of these researches are un-
fortunately contained in Scandinavian scientific journals and are
for this reason out of reach to the great majority of readers who
are not familiar with any of the allied Scandinavian languages.
This applies in a still greater degree to the Russians. German
scholars of the Lexis school have also contributed important
memoirs, but strangely enough their researches are little known
in this country or in England, a fact which is emphasized through
the belated English discussion on the theory of dispersion as devel-
oped by Lexis and his disciples. The same can also be said with
regard to the Italian statisticians.
In the present work I have attempted to treat all these modern
researches from a common point of view, based upon the mathe-
matical principles as contained in the immortal work of the great
Laplace, "Theorie analytique des Probabilités," a work which
despite its age remains the most important contribution to the
Digitized by
Google
XU PBEFACE.
theory of probabilities to our present day. CharHer has rightly
observed that the modern statistical methods may be based upon
a few condensed rules contained in the great work of Laplace.
This holds true despite the fact that many modern English
writers of late have shown a certain distrust, not to say actual
hostility, towards the so-called mathematical probabilities as
defined by the French savant, and have in their place adopted the
purely empirical probability ratios as defined by Mill, Venn and
Chrystal. It is quite true that it is possible to build a consistent
theory of such ratios, as for an instance is done by the Danish
astronomer and actuary, Thiele. The theory, however, then
becomes purely a theory of observations in which the theory of
probability takes a secondary place. The distrust in the so-called
mathematical a priori probabilities of Laplace I believe, however,
to be unfounded, and the criticism to which that particular kind
of probabilities is subjected by a few of the modern English
writers is, I believe, due to a misapprehension of the true nature
of the Bernoullian Theorem. This renowned theorem remains
to-day the cornerstone of the theory of statistics, and upon it I
have based the most important chapters of the present work.
Following the beautiful investigations of Tschebycheff and
Pizetti in their proofs of Bernoulli's Theorem and the closely
related theorem of large numbers by Poisson I have adopted the
methods of the Swedish astronomer and statistician, Charlier,
in the discussion of the Lexian dispersion theory.
The theory of frequency curves is treated from various points
of view. I have first given a short historical introduction to the
various investigations of the law of errors. The Gaussian
normal curve of error was by the older school of statisticians
held to be sufficient to represent all statistical frequencies, and
actual observed deviations from the normal curve were attributed
to the limited mmiber of observations. Through the original
memoirs of Lexis and the investigations of Thiele the fallacy of
such a dogmatic belief was finally shown. The researches of
Thiele, and later of Pearson, developed later the theory of skew
curves of error. As recently as 1905 Charlier finally showed
that the whole theory of errors or frequency curves may be
brought back to the principles of Laplace. I have treated this
Digitized by
Google
PBEFACE. XIU
subject by the methods of both Pearson and Charlier, although I
have given the methods of the latter a predominant place, because
of their easy and simple application in the practical computations
required by statistical work. The mathematical theory of cor-
relation, which is treated in an elementary manner only, is based
upon the same principles.
The statistical examples serve as illustrations of the theory, and
it will be noted that it is possible to solve all the important sta-
tistical problems presenting themselves in daily work on the basis
of a theory of mathematical probabilities instead of on a direct
theory of statistical methods. I have here again followed Charlier
in dividing all statistical problems into two distinct groups,
namely, the homograde and the heterograde groups.
In treating the philosophical side of the subject I have naturally
not gone into much detail. However, I have tried to emphasize
the two diametrically opposite standpoints, namely the principle
of what von Krieshas called the principle of "cogent reason,"
and the principle which Boole has aptly termed "the equal
distribution of ignorance." These two principles are clearly illus-
trated in the case of the so-called inverse probabilities. As far as
pure theory is concerned, the theory of "inverse probabilities"
is rigorous enough. It is only when making practical applications
of the rule of inverse probabilities (the so-called Bayes' Rule)
that many writers have made a fatal mistake by tacitly assuming
the principle of " insufficient reason " as the only true rule of com-
putation. This leads to paradoxical results as illustrated by the
practical problem from the region of actuarial science in Chapter
VI in this book.
In a work of this character I have natxu'ally made an extended
use of the higher mathematical analysis. However, the reader
who is not versed in these higher methods need not feel alarmed
on this account, as the elementary chapters are arranged in such a
way that the more difficult paragraphs may be left out. I have
in fact divided the treatise into two separate parts. The first
part embraces the mathematical probabilities proper and their
applications to homograde statistical series. This part, I think,
constitutes what is usually given as a course in vital statistics in
many American colleges. I hardly deem it worth while to give a
Digitized by
Google
XIV PREEACE.
detailed discussion on the collection and arrangement of the sta-
tistical data as to various frequency distributions. The mere
graphical and serial representation of frequency fimctions by
means of histographs and frequency columns is so simple and
evident that a detailed description seems superfluous. The fitting
of the various curves to analytical formulas and the determination
of the various parameters seem to me of much greater impor-
tance. The theory of curve fitting which is treated in the second
volume is f oimded upon a more advanced mathematical analysis
and is for this reason out of reach to the average American student
who desires to learn only the rudiments of modern statistical
methods. Practical statisticians, on the other hand, will derive
much benefit from these higher methods. It is a fact generally
noted in mathematics that the practical application of a difficult
theory is much simpler than that of a more elementary theory.
This is amply proven by the appearance of an excellent little
Scandinavian brochure by Charlier: "Grunddragen af den mate-
matiske Statistikken." ("Rudiments of Mathematical Statis-
tics.") I have always attempted to adapt theory to actual
practical problems and requirements rather than to give a purely
mathematical abstract discussion. In fact it has been my aim
to present a theory of probabilities as developed in recent years
which would prove of value to the practical statistician, the
actuary, the biologist, the engineer and the medical man, as
well as to the student who studies mathematics for the sake of
mathematics alone.
The nucleus of this work consisted of a nimiber of notes written
in Danish on various aspects of the theory of probabilities, col-
lected from a great number of mathematical, philosophical and
economic writings in various languages. At the suggestion of
my former esteemed chief, Mr. H. W. Robertson, F.A.S., As-
sistant Actuary of the Equitable Life Assurance Society of the
United States, I was encouraged to collect these fragmentary
notes in systematic form. The rendering in English was done
by myself personally with the assistance of Mr. W. Bonynge.
With his assistance most of the idiomatic errors due to my
barbaric Dano-Enghsh have been eliminated. The notes stand,
however, in the main as a faithful reproduction of my original
Digitized by
Google
PBEFACB. XV
English copy. Although the resulting "Dano-English*' may
have its great shortcomings as to rhetoric and grammar^ I hope
to have succeeded in expressing what I wanted to say in such
a manner that my possible readers may follow me without
difficulty.
I gladly take the opportunity of expressing my thanks to a
number of friends and colleagues who in various ways have as-
sisted me in the preparation of this work. My most grateful
thanks are due to Mr. F. W. Frankland, Mr. H. W. Robertson
and Mr. Wm. Bonynge not only for reading the manuscript and
most of the proofs, but also for the friendly help and encourage-
ment in the completion of this volume. The introductory note
by Mr. Frankland, coming from the pen of a scholar who for the
most of a life-time |^has worked with statistical-mathematical
subjects and who has taken a special interest in the philosophical
and metaphysical aspects of the probability theory, I regard as
one of the strong points of the book. My debts to Messrs.
Frankland and Robertson as well as to Dr. W. Strong, Associate
Actuary of the Mutual Life Insurance Company, are indeed of
such a nature that they cannot be expressed in a formal preface.
My thanks are also due to Mr. A. Pettigrew in correcting the
first rough draught of the first three chapters at a time when my
knowledge of English was most rudimentary, to Mr. M. Dawson,
Consulting Actuary, and Mr. R. Henderson, Actuary of the Equity
able Life, for reading a few-chapters in manuscript and making
certain critical suggestions, to Professors C. Grove and W. Fite, of
Columbia University, for numerous technical hints in the working
out of various mathematical formulas in Chapter VI, to Miss
G. Morse, librarian of the Equitable Library, in the search of
certain bibliographical material. Last but not least I wish to
express my sincerest thanks to several of my Scandinavian com-
patriots for allowing me to quote and use their researches on
various statistical subjects. I want in this connection especially
to mention Professor Charlier, of Limd, and Professors Wester-
gaard and Johannsen, of Copenhagen.
To The Macmillan Company and The New Era Printing Com-
pany I beg leave to convey my sincere appreciation of their very
courteous and accommodating attitude in the manufacture of
Digitized by
Google
XVI PBEFACB.
this work. Their spirit has been far from commercial in this —
from a pure business standpoint — somewhat doubtful under-
taking.
Abne Fisheb.
New York,
October, 1915.
Digitized by
Google
TABLE OF CONTENTS.
Chapter I.
In'.roduction: General Principles and PkUoaaphicdl Aspects.
Page
1. Methods of Attack 1
2. Law of Causality 1
3. Hypothetical Judgments 3
4. Hypothetical Disjunctive Judgments 4
5. General Definition of the Probability of an Event 5
6. Equally likely Cases 6
7. Objective and Subjective Probabilities g
Chapteb IL
Historical and Bibliographical Notes.
8. Pioneer Writers 11
9. Bernoulli, de Moivre and Bayes 12
10. Application to Statistical Data 13
11. Laplace and Modem Writers 14
Chapteb III.
The Mathematical Theory of Probabilities.
12. Definition of Mathematical Probability 17
13. Example 1 18
14. Example 2 20
15. Example 3 20
16. Example 5 22
17. Example 6 23
Chapter IV.
The Addition and Multiplication Theorems in Probabilities.
18. Systematic Treatment by Laplace 26
19. Definition of Technical Terms 26
20. The Theorem of the Complete or Total Probability, or the Proba-
bility of "Either Or" 27
21. Theorem of the Compound Probability or the Probability of "As
WeU As " 28
22. Poincaré's Proof of the Addition and Multiplication Theorem 30
23. Relative Probabilities 31
24. Multiplication Theorem 33
25. Probability of Repetitions 33
1» xvii
Digitized by
Google
rsrm table op contents.
26. Application of the Addition and Multiplication Theorems in Problems
in Probabilities 35
27. Example 12 35
28. Example 13 / 36
29. Example 14 37
30. Example 15 37
31. Example 16 38
32. Example 17 39
33. Example 18. De Moivre's Problem 40
34. Example 19 42
35. Example 20. Tchebycheff's Problem 46
V"l
\ 65,
Chapter V.
Mathematical Expectation.
36. Definition, Mean Values 49
37. The Petrograd (St. Petersburg) Problem 51
38. Various Explanations of the Paradox. The Moral Expectation 51
Chai>ter VI.
Probability a Posteriori.
39. Bayes's Rule. A Posteriori Probabilities 54
40. Discovery and EQstory of the Rule 55
41. Hayes's Rule (Case I) 56
42. Hayes's Rule (Case II) 59
43. Determination of the Probabilities of Future Events Hased upon
Actual Observations 59
44. Examples on the Application of Hayes's Rule 61
45. Criticism of Hayes's Rule 62
46. Theory versus Practice 64
47. Probabilities expressed by Integrals 67
48. Example 24 70
49. Example 25. Bing's Paradox 72
50. Conclusion 76
Chapter VII.
The Law of Large Numbers.
51. A Priori and Empirical Probabilities 82
52. Extent and Usage of Both Methods 85
53. Average a Priori Probabilities 87
The Theory of Dispersion 88
55. EQstorical Development of the Law of Large Numbers 89
Chapter VIII.
Introductory Formulae from the Infinitesimal Calculus.
66. Special Integrals 90
57. Wallis's Expression of x as an Infinite Product ^. . . 90
68. De Moivre—Stirling's Formula 92
Digitized by VjOOQ IC
TABLE OP CONTENTS. XIX
Chapteb IX.
Law of Large Numbers. Mathematical Deduction.
69. Repeated Trials 96
60. Most Probable Value 97
61. Simple Numerical Examples 97
62. The Most Probable Value in a Series of Repeated Trials 99
63. Approximate Calculation of the Maximum Term, T«, 101
64. Expected or Probable Value IC
65. Summation Method of Laplace. The Mean Error 104
66. Mean Error of Various Algebraic Expressions 106 >^
67. Tchebycheff 's Theorem 108 ^
68. The Theorems of Poisson and Bernoulli proved by the Application
of the Tchebycheffian Criterion 110
69. Bemoullian Scheme 110
70. Poisson's Scheme Ill
71. Relation between Empirical Frequency Ratios and Mathematical
Probabilities 114
72. Application of the Tchebycheffian Criterion 115
Chapteb X.
The Theory of Dispersion and the Criterions of Lexis and Charlier.
73. Bemoullian, Poisson and Lexis Series 117
74. The Mean and Disperedon 118,,.^
74o. Mean or Average Deviation 122
75. The Lexian Ratio and Charlier Coefficient of Disturbancy 124
Chapter XI.
Application to Games of Chance and Statistical Problems.
76. Correlate between Theory and Practice 127
77. Homograde and Heterograde Series. Technical Terms 128
78. Computation of the Mean and the Dispersion in Practice 130
79. Westergaard's Experiments 136
80. Charlier's Experiments 137
81. Experiments by Bonynge and Fisher 141
CHAPTER XII.
CorUinuation of the Application of the Theory of Probabilities to
Homograde Statistical Series.
82. General Remarks 146
83. Analogy between Statistical Data and Mathematical Probabilities. . 147
84. Number of Comparison and Proportional Factors 149
85. Child Births in Sweden 151
86. Child Births in Denmark , 152
Digitized by
Google
XX TABLE OF CONTENTS.
87. Danish Marriage Series 153
88. StiUbirths ' 164
89. Coal Mine Fatalities *. 155
90. Reduced and Weighted Series in Statistics 157
91. Secular and Periodical Fluctuations 161
92. Cancer Statistics 165
93. Application of the Lexian Dispersion Theory in Actuarial Science.
Conclusion 167
»'
Digitized by
Google
CHAPTER I.
INTRODUCTION: GENERAL PRINCIPLES AND PHILOSOPHICAL
ASPECtS.
1. Methods of Attack. — ^The subject of the theory of proba-
bilities may be attacked in two different ways, namely in a
philosophical, and in a mathematical manner. At first, the
subject originated as isolated mathematical problems from games
of chance. The pioneer writers on probability such as Cardano,
Galileo, Pascal, Fermat, and Huyghens treated it in this way.
The famous Bernoulli was, perhaps, the first to view the subject
from the philosopher's point of view. Laplace wrote his well-
known "Essai Philosophique des Probabilités," wherein he terms
the whole science of probability as the application of common
sense. During the last thirty years numerous eminent philo-
sophical scholars such as Mill, Venn, and Keynes of England,
Bertrand and Poincaré of France, Sigwart, von Kries .and Lange
of Germany, Kroman of Denmark, and several Russian scholars
have written on the philosophical aspect.
In the ordinary presentation of the elements of the theory of
probability as found in most English text-books, the treatment
is wholly mathematical. The student is given the definition of
a mathematical probability and the elementary theorems are
then proved. We shall, in the following chapter, depart from
this rule and first view the subject, briefly, from a philosophical
standpoint. What the student may thus lose in time we hope
he may gain in obtaining a broader view of the fimdamental
principles underlying our science. At the same time, the reader
who is unacquainted with the science of philosophy or pure logic,
need not feel alarmed, since riot even the most elementary
knowledge of the principles of formal logic is required for the
understanding of the following chapter.
2. Law of Causality. — In a great treatise on the Chinese civiliza-
tion, Oscar Peschel, the German geographer and philosopher,
makes the following remarks: "Since our intellectual awakening,
since we have appeared on the arena of history as the creators
2 1
Digitized by
Google
2 INTRODUCTION. [2
and guardians of the treasures of culture, we have sought after
only one thing, of the presence of which the Chinese had no
idea, and for which they would give hardly a bowl of rice. This
invisible thing we call causality. We have admired a vast
number of Chinese inventions, but even if we seek through their
huge treasures of philosophical writing we are not indebted to
them for a single theory or a single glance into the relation
between cause and effect."
The law of causality may be stated broadly as follows: Every-
thing that happens, and everything that exists, necessarily
happens or exists as the consequence of a previous state of things.
This law cannot be proven. It must be taken, a priori, as an
axiom; but once accepted as a truth it does away with the belief
of a capricious ruling power, and even if the strongest disbeliever
of the law may deny its truth in theory he invariably applies it
in practice during his daily occupation in life.
All future human activity is more or less influenced by past
and present conditions. Modern historical writings, as for
instance the works of the brilliant ItaUan historian, Ferrero,
always seek to connect past events with present social and
economic conditions. Likewise great and constructive statesmen
in trying to shape the destinies of nations always reckon with
past and present events and conditions. We often hear the term,
"a man with foresight," applied to leading financiers and states-
men. This does not mean that such men are gifted with a vision
of the future, but simply that they, with a detailed and thorough
knowledge of past and present events, associated with the par-
ticular undertaking in which they are interested, have drawn
conclusions in regard to a future state of affairs. For example,
when the Canadian Pacific officials, in the early eighties, chose
Vancouver as the western terminal for the transcontinental
railroad, at a time when practically the whole site of the present
metropolis of western Canada was only a vast timber tract, they
realized that the conditions then prevailing on this particular
spot — ^the excellent shipping facilities, the favorable location in
regard to the Oriental trade, and the natural wealth of the sur-
rounding country — ^would bring forth a great city, and their
predictions came true.
Digitized by
Google
3] HYPOTHETICAL JUDGMENTS. 3
Predictions with regard to the future must be taken seriously-
only when they are based upon a thorough knowledge of past
and present events and conditions. Prophecies, taken in a
purely biblical sense of the term and viewed from the law of
causality, are mere guesses which may come true and may not.
A prophet can hardly be called more than a successful guesser.
Whether there have been persons gifted with a purely prophetic
vision is a question which must be left to the theologians to
wrangle over.
3. Hypothetical Judgments. — ^Any person with ordinary in-
tellectual faculties may, however, predict certain future events
with absolute certainty by a simple application of the principle
of hypothetical judgment. The typical fonn of the hypothetical
judgment is as follows: If a certain condition exists, or if a certain
event takes place then another definite event will surely follow.
Or if ^ exists B will invariably follow.
Mathematical theorems are examples of hypothetical judg-
ments. Thus in the geometry of the plane we start with certain
ideas (axioms) about the line and plane. From these axioms
we then deduce the theorems by mere hypothetical judgments.
Thus in the EucUdian geometry we find the axiom of parallel
lines, which assumes that through a point only one line can be
drawn parallel to another given line, and from this assumption
we then deduce the theorem that the sum of the angles in a
triangle is 180°. But it must be borne in mind that this proof is^
vaUd only on the assumption of the actual existence of such lines.
If we could prove directly by logical reasoning or by actual
measurement, that the sum of the angles in any triangle is equal
to 180°, then we would be able to prove the above theorem, the
so-called "hole in geometry," independently of the axiom of
parallel lines.
A Russian mathematician, Lobatschewsky, on the other hand,,
assumed that through a single point an infinite number of parallels,
might be drawn to a previously given line, and from this as-
sumption he built up a complete and valid geometry of his own..
Still another mathematician, Riemann, assumed that no lines,
were parallel to each other, and from this produced a perfectly
valid surface geometry of the sphere.
Digitized by
Google
A INTRODUCTION. [4
As examples of hypothetical judgment we have the two follow-
ing well-known theorems from elementary geometry and algebra.
If one of the angles of a triangle is divided into two parts, then
the line of division intersects the opposite side. If a decadian
number is divided by 5 there is no remainder from the division.
In natural science, hypothetical judgments are founded on
certain occurrences (phenomena) which, without exception, have
taken place in the same manner, as shown by repeated obser-
vations. The statement that a suspended body will fall when its
support is removed is a hypothetical judgment derived from
actual experience and observation.
4. Hypothetical Disjunctive Judgments. — In hypothetical
judgments we are always able to associate cause and effect. It
happens frequently, however, that our knowledge of a certain
complex of present conditions and actions is such that we are
not able to tell beforehand the resulting consequences or effects
of such conditions and actions, but are able to state only
that either an event Ei or an event E2, etc., or an event En will
happen. This represents a hypothetical disjunctive judgment
whose typical form is: If ^ exists either Ei, E2, E^, • • • or En
-will happen.
If we take a die, i. e,, a homogeneous cube whose faces are
marked with the numbers from one to six, and make an ordinary
throw, we are not able to tell beforehand which side will turn
up. True, we have here again a previous state of things, but the
conditions do not allow such a simple analysis as the cases we
have hitherto considered under the purely hypothetical judgment.
Here a multitude of causes influence the final result — ^the weight
and centre of gravity of the die, the infinite number of possible
movements of the hand which throws the die, the force of contact
with which the die strikes the table, the friction, etc. All these
causes are so complex that our minds are not afforded an op-
portunity to grasp and distinguish the impulses that determine
the fall of the die. In other words we are not able to say, a
priori, which face will appear. We only know for certain that
either 1, 2, 3, 4, 5, or 6 will appear. If a line is drawn through
the vertex of a triangle, it either intersects the opposite side or
it does not. If a number is divided by 5 the division either gives
Digitized by
Google
6] GENERAL DEFINITION OP PROBABILITY OP AN EVENT. 5
only an integral number or leaves a remainder. If an opening
is made in the wall of a vessel partly filled with water, then either
the water escapes or remains in the vessel. All the above eases
are examples of hypothetical disjunctive judgments.
The four cases show, however, a common characteristic. They
all have a certain partial domain, where one of the mutually
exclusive events is certain to happen, while the other partial
domain will bring forth the other event, and the total area of
action embraces both events. Taking the triangle, we notice
that the lines may pass through all the points inside of an angle
of 360°, but only the Unes falling inside the internal vertical
angle, ^, of the triangle will produce the event in question,
namely the line intersecting the opposite side. There will be
'an outflow from the vessel only if the hole is made in. that part
of the wall which is touched by the fluid.
All problems do not allow of such simple analysis, however,
as will be seen from the following example. Suppose we have
an urn containing 1 white and 2 black balls and let a person
draw one from the urn. The hypothetical disjunctive judgment
immediately tells us that the ball will be either black or white,
but the particular domain of each event cannot be limited to the
fixed border lines of the former examples. Any one of the balls
may occupy an infinite number of positions, and furthermore we
may imagine an infinite number of movements of the hand which
draws the ball, each movement being associated with a particular
point of position of the ball in the urn. If we now assume each
of the three balls to have occupied all possible positions in the
urn, each point of position being associated with its proper
movement of the hand, it is readily seen that a black ball will
be encountered twice as often as a white ball in a particular
point of position in the urn, and for this reason any particular
movement of the hand which leads to this point of position
grasps a black ball twice as often as a white ball.
5. General Definition of the Probability of an Event. — ^All the
above examples have shown the following characteristics:
(1) A total general region or area of action in which all actions
may take place, this total area being associated with all possible
events.
Digitized by
Google
6 INTRODUCTION. [ 6
•^ (2) A limited special domain in which the associated actions
produce a special event only.
If these areas and domains^ as in the above cases^ are of such a
nature that they allow a purely quantitative determination,
they may be treated by mathematical analysis. We define
now, without entering further into its particular logical signifi-
cance, the ratio of the second special and limited domain to the
first total region or area as the probability of the happening of
the event, E, associated with domain No. 2.
We must, however, hasten to remark that it is only in a com-
paratively few cases that we are able, a priori, to make such a
segregation of domains of actions. This may be possible in
purely abstract examples, as for instance in the example of the
division of the decadian number by 5. But in all cases where
organic life enters as a dominant factor we are unable to make such
sharp distinctions. If we were asked to determine the proba-
bility of an ar-year-old person being alive one year from now, we
should be able to form the hypothetical disjunctive judgment:
An ar-year-old person will be either alive or dead one year from
now. But a further segregation into special domains as was
the case with the balls in the urn is not possible. Many ex-
tremely complex causes enter into such a determination; the
health of the particular person, the surroundings, the daily life,
the climate, the social conditions, etc. Our only recourse in
such cases is to actual observation. By observing a large
number of persons of the same age, x, we may, in a purely em-
pirical way, determine the rate of death or survival. Such a deter-
mination of an unknown probability is called an empirical proba-
bility. An empirical probability is thus a probability, into the
determination of which actual experience has entered as a domi-
nant factor.
6. Equally Likely Cases. — The main difficulty, in the appli-
cation of the above definition of probability, lies in the deter-
mination of the question whether all the events or cases taking
place in the general area of action may be regarded as equally
likely or not. Two diametrically opposite views have here been
brought forward by writers on probabilities. One view is based
upon the principle which in logic is known as the principle of
Digitized by
Google
6] EQUALLY LIKELY CASES. 7
"insuflScient reason," while the other view is based upon tl^e
principle of " cogent reason." The classical writers on the theory
of probability, such as Jacob Bernoulli and Laplace, base the
theory on the principle of insuflScient reason exclusively. Thus
Bernoulli declares the six possible cases by the throw of a die to
be equally likely, since "on account of the equal form of all the
faces and on account of the homogeneous structure and equally
arranged weight of the die, there is no reason to assume that any
face should turn up in preference to any other." In one place
Laplace says that the possible cases are "cases of which we are
equally ignorant," and in another place, "we have no reason to
believe any particular case should happen in preference to any
other."
The opposite view, based on the principle of cogent reason,
has been strongly endorsed in an admirable little treatise by the
German scholar, Johannes von Kries.^ Von Eiies requires, first
of all, as the main essential in a logical theory of probability,
that "the arrangement of the equally likely cases must have a
cogent reason and not be subject to arbitrary conditions."
In several illustrative examples, von Eyries shows how the
principle of insuflScient reason may lead to different and paradox-
ical results. The following example will illustrate the main
points in von ICries's criticism. Suppose we be given the follow-
ing problem: Determine the probability of the existence of human
beings on the planet Mars. By applying the first mentioned
principle our reasoning would be as follows: We have no more
reason to assume the actual existence of man on the planet than
the complete absence. Hence the probability for the non-
existence of a human being, is equal to ^. Next we ask for the
probability of the presence or non-presence of another earthly
mammal, say the elephant. The answer is the same, ^. Now
the probability for the absence of both man and elephant on the
planet is ^ X ^ = i.* The probability for the absence of a third
mammal, the horse, is also ^, or the probability for the absence
of man, elephant, and horse is equal to (^Y = |. Proceeding in
the same manner for all mammals we obtain a very small proba-
* "Die Principien der Wahrscheinlichkeitsrechnung."! Berlin, 1886.
' See the chapter on multiplication of probabilities.
Digitized by
Google
8 INTRODUCTION. [6
bility for the complete absence of all mammals on Mars, or a
very large probability, almost equal to certainty, that the planet
harbors at least one mammal known on our planet, an answer
which certainly does not seem plausible. But we might as well
have put the question from the start: what is the probability
of the existence or absence of any one earthly mammal on Mars?
The principle of insufficient reason when applied directly would
here give the answer i, while when applied in an indirect manner
the same method gave an answer very near to certainty.
An urn is known to contain white and black balls, but the
number of the balls of the two different colors is unknown. What
is the probability of drawing a white ball? The principle of
insufficient reason gives us readily the answer: J, while the prin-
ciple of cogent reason would give the same answer only if it were
known a priori that there were equal numbers of balls of each
color in the urn before the drawing took place. Since this
knowledge is not present a priori, we are not able to give any
answer, and the problem is considered outside the domain of
probabilities. There is no doubt that th)e principle advocated
by von Eyries is the only logical one to apply, and a recent
treatise on the theory of probability by Professor Bruhns of
Leipzig^ also gives the principle of cogent reason the most promi-
nent place. On the other hand it must be adnutted that if the
principle was to be followed consistently in its very extreme it
would of course exclude many problems now found in treatises
on probability and limit the application of our theory consider-
ably in scope. Still, however, we must agree with von Eyries
that it seems very foolhardy to assign cases of which we are
absolutely in the dark, as being equally likely to occur. This
very principle of insufficient reason is in very high degree re-
sponsible for the somewhat absurd answers to questions on the
so-called "inverse probabilities," a name which in itself is a great
misnomer. We shall later in the chapter on "a posteriori"
probabilities discuss this question in detail. At present we shall
only warn the student not to judge cases of which he has no
knowledge whatsoever to be equally likely to occur. The old rule
"experience is the best teacher" holds here, as everywhere else.
^ "KoUektivmasslehre and Wahrscheinlichkeitsrechnung," Leipzig, 1903.
Digitized by
Google
7] OBJBCTIVB AND SUBJECTIVB PROBABILITIES. 9
7. Objective and Subjective Probabilities. — In this connection
it is interesting to note the lucid remarks by the Danish statis-
tician, Westergaard. "By every well arranged game of chance,
by lotteries, dice, etc.," Westergaard says, "everything is ar-
ranged in such a way that the causes influencing each draw or
throw remain constant as far as possible. The balls are of the
same size, of the same wood, and have the same density; they are
carefully mixed and each ball is thus apparently subject to the
influences of the same causes. However, this is not so. Despite
all our efforts the balls are different. It is impossible that they
are of exactly mathematically spherical form. Each ball has its
special deviation from the mathematical sphere, its special size
and weight. No ball is absolutely similar to any one of the
others. It is also impossible that they may be situated in the
same manner in the bag. In short there is a multitude of ap-
parently insignificant differences which determine that a certain
definite ball and none of the other balls may be drawn from the
bag. If such inequalities did not exist one of two things would
happen. Either all balls would turn up simultaneously or also
they would all remain in the bag. Many of these numerous
causes are so small that they perhaps are invisible to the naked
eye and completely escape all calculations, but by mutual
action they may nevertheless produce a visible result."
It thus appears that a rigorous application of the principle of
cogent reason seems impossible. However, a compromise
between this principle and that of the principle of insufficient
reason may be effected by the following definition of equally
possible cases, viz. : Equally possible cases are svjch cases in which
we J after an exhaustive analysis of the physical laws underlying the
structure of the complex of causes influencing the special event, are
led to assume that no particular case will occui in ^preference to any
other. True, this definition introduces a certain subjective
element and may therefore be criticized by those readers who
wish to make the whole theory of probabilities purely objective.
Yet it seems to me preferable to the strict application of the
principle of equal distribution of ignorance. Take again the
question of the probability of the existence of human beings on
the planet Mars. The principle of equal distribution of ignorance
Digitized by
Google
10 INTRODUCTION. [7
readily gives us without further ado the answer J. Modern astro-
physical researches have, however, verified physical conditions on
the planet which make the presence of organic Uf e quite possible,
and according to such an eminent authority as Mr. Lowell, perhaps
absolutely certain. Yet these physical investigations are as
yet not sufficiently complete, and not in such a form that they
may be subjected to a purely quantitative analysis as far as the
theory of probabilities is concerned. Viewed from the standpoint
of the principle of cogent reason any attempt to determine the
numerical value of the above probability must therefore be put
aside as futile. This result, negative as it is, seems, however,
preferable to the absolute guess of | as the probability.
Digitized by
Google
I
CHAPTER II.
HISTORICAL AND BIBLIOGRAPHICAL NOTES.
8. Pioneer Writers.-J-The first attempt to define the measure
of a probability of a future event is credited to the Greek philos-
opher, Aristotle. Aristotle calls an event probable when the
majority, or at least the majority of the most intellectual persons,
deem it likely to happen. | This definition, although not allowing
a purely quantitative measurement, makes use of a subjective
judgment.
] The first really mathematical treatment of chance, however, is
given by the two Italian mathematicians, Cardano and Galileo,
who both solved several problems relating to the game of dice.
Cardano, aside from his mathematical occupation, was also a
professional gambler and had evidently noticed that in all kinds
of gambling houses cheating was often resorted to. In order
that the gamester might be fortified against such cheating prac-
tices, Cardano wrote a little treatise on gambling wherein he
discussed several mathematical questions connected with the
different games of dice as played in the Italian gambling houses
at that time. Galileo, although not a professional gambler, was
often consulted by a certain Italian nobleman on several problems
relating to the game of dice, and fortunately the great scholar
has left some of his investigations in a short memoir. In the
same manner the two great French mathematicians, Pascal and
Fermat, were often asked by a professional gamester, the cheva-
lier de Mere, to apply their mathematical skill to the solution of
different gambling problems.! It was this kind of investigation
which probably led Pascal to the discovery of the arithmetical
triangle, and the first rudiments of the combinatorial analysis,
which had its origin in probability problems, and which later
evolved into an independent branch of mathematical analysis.
' One of the earliest works from the illustrious Dutch physicist,
Huyghens, is a small pamphlet entitled "de Ratiociniis in Ludo
Aleæ," printed in Leyden in the year 1657. Huyghens' tract is
11
Digitized by VjOOQ IC
12 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [9
the first attempt of a systematic treatment of the subject. The
famous Leibnitz also wrote on chance. His first reference to a
mathematical probability is perhaps in a letter to the philoso-
pher, Wolff, wherein he discusses the summation of the infinite
series 1 — 1 + 1 — !+•••. Besides he solved several problems.
9. Bernoulli^ de Moivre and Bayes.-lThe first extensive
treatise on the theory as a whole is from the hand of the famous
Jacob Bernoulli.* Bernoulli's book, "Ars Conjectandi," marks a
revolution in the whole theory of chance. *The author treats
the subject from the mathematical as well as from a philo-
sophical point of view, and shows the manifold applications of
the new science to practical problems. Among other important
theorems we here find the famous proposition which has become
known as the Bernoulli Theorem in the mathematical theory of
probabilitiesA Bernoulli's work has recently been translated
from the Latin into German,^ and a student who is interested in
the whole theory of probability should not fail to read this
masterly work.
(The English mathematicians were the next to carry on the
investigations. Abraham de Moivre, a French Huguenot, and
one of the most remarkable mathematicians of his time, wrote
the first English treatise on probabilities.^ j This book was cer-
tainly a worthy product of the masterful mind of its author, and
may, even today, be read with useful results, although the
method of demonstration often appears lengthy to the student
who is accustomed to the powerful tools of modern analysis.
The high esteem in which the work by de Moivre is held by
modern writers, is proven by the fact that E. Czuber, the eminent
Austrian mathematician and actuary, so recently as two years
ago translated the book into German, i A certain problem (see
Chap. IV) still goes under the name of "The Problem of de
Moivre" in the modern literature on probability. A contem-'
porary of de Moivre, Stirling, contributed also to the new branch
of mathematics, and his name also is immortalized in the theory
of probability by the formula which bears his name, and by which
we are able to express large factorials to a very accurate degree
of approximation. The third important English contributor is
1 Ars donjectandi, OstwaJd's Klassiker No. 108, Leipzig, 1901.
« de Moivre: "The Doctrine of Chances," London, 1781.
Digitized by
Google
10] APPLICATION TO STATISTICAL DATA. 13
the Oxford clergyman, T. Bayes. * Bayes' treatise, which was
published after his death by Price, in Philosophical Transactions
for 1764,^ deals with the determination of the a posteriori proba-
bilities, and marks a very important stepping stone in our whole
theory. \ Unfortunately the rule known as Bayes' Rule has been
applied very carelessly, and that mostly by some of Bayes' own
countrymen; so the whole theory of Bayes has been repudi-
ated by certain modern writers. A recent contribution by the
Danish philosophical writer, Dr. Kroman, seems, however, to have
cleared up all doubts on the subject, and to have given Bayes his
proper credit.
10. Application to Statistical Data.— In the eighteenth century
some of the most celebrated mathematicians investigated
problems in the theory of probability. The birth of life as-
surance gave the whole theory an important application to
social problems and the increasing desire for the collection of all
kinds of statistical data by governmental bodies all over Europe
gave the mathematicians some highly interesting material to
which to apply their theories. No wonder, therefore, that we
in this period find the names of some of the most illustrious mathe-
maticians of that time, such as Daniel Bernoulli, Euler, Nicolas
and John Bernoulli, Simpson, D'Alimbert and BufiFon, closely
connected with the solution of problems in the theory of mathe-
matical probabilities. I We shall not attempt to give an acc4>unt
of the different works of these scientists, but shall only dwell
briefly on the labors of Bernoulli and D'AIambert. In a memoir
in the St. Petersburg Academy, Daniel Bernoulli is the first to
discuss the so called St. Petersburg Problem, one of the most
hotly debated in the whole realm of our science. We may here
mention that this problem is today one of the main pillars in the
economic treatment of value. Bernoulli introduced in the dis-
cussion of the above mentioned problem the idea of the "moral
exp)ectation," which under slightly different names appears in
nearly all standard writings on economics.
D'Alambert is especially remembered for the critical attitude
he took towards the whole theory. Although one of the most
brilliant thinkers of his age, the versatile Frenchman made some
great blunders in his attempt to criticize the theories of chance.
Digitized by
Google
14 HISTORICAL AND BIBLIOGRAPHICAITNOTES. [11
BuflFon's name is remembered because of the needle problem,
and he may properly be called the father of the so-called "ge-
ometrical" or "local" probabilities.
11. Laplace and Modem Writers. — ^We now come to that
resplendent genius in the investigation of the mathematical
theory of chance, the immortal Laplace, who in his great work,
"Theorie Analytique des Probabilités," gave the final mathe-
matical treatment of the subject. I This massive volume leaves
nothing to be desired and is still today — ^more than one hundred
years after its first publication — ^a most valuable mine of in-
formation and compares favorably with much more modern
treatises. But like all mines, it requires to be mined and is by
no means easy reading for a beginner. An elementary extract,
"Essai Philosophique des Probabilités," containing the more
elementary parts of Laplace's greater work and stripped of all
mathematical formulas has recently appeared in an English
translation.
Among later French works, Cournot's "Exposition de la
Theorie des Chances et des Probabilités" (1843), treated the
principal questions in the application of the theory to practical
problems in sociology. 'In 1837 Poisson published his "Re-
cherches sur les Probabilités" in which he for the first time proved
the famous theorem which bears his name. Poisson and his
Belgian contemporary, Quetelet, made extensive use of the
theory in the treatment of statistical dataJ
Among the most recent French works, we mention especially
Bertrand's "Calcul des Probabilités" (Paris, 1888), Poincaré's
"Calcul des Probabilités" (Paris, 1896), and Borel's "Calcul des
Probabilités" (Paris, 1901). We especially recommend Poin-
caré's brilliant little treatise to every student who masters the
French language, as this book makes no departure from the
lively and elucidating manner in which this able mathematical
writer treated the numerous subjects on which he wrote during
his long and brilliant career as a mathematician.
'■ Of Russian writers, the mathematician, TchebychefiF, has given
some extensive general theorems relating to the law of large
numbers. ; Unfortunately Tchebycheff 's writings are for the
most part scattered in French, German, Scandinavian and
Digitized by
Google
11] LAPLACE AND MODERN WRITERS. 15
Russian journals, and thus are not easily accessible to the ordinary
reader. A Russian artillery officer, Sabudski, has recently pub-
lished a treatise on ballistics in German, wherein he extends the
views formulated by Tchebycheff.
Of Scandinavian writers we mention T. N. Thiele, who prob-
ably was the first to publish a systematic treatise on skew curves.^
An abridged edition of this very original work has recently been
translated into English.^ The Dane, Westergaard, is the author
of the most extensive and thorough treatise on vital statistics
which we possess at the present time. Westergaard's work has
recently been translated into German,' and is strongly recom-
mended to the student of vital statistics on account of his clear
and attractive style of presenting this important subject.
The Swedish mathematicians Charlier and Gylden have
published a series of memoirs in different Scandinavian journals
and scientific transactions. We may also, in this category,
mention the numerous small articles by the eminent Danish
actuary, Dr. Gram.
While the German mathematicians in general are the most
fertile writers on almost every branch of pure and applied mathe-
matics, they have not shown much activity in the theory of
mathematical probability except in the past ten years. But
during that time there has appeared at least a dozen standard
works in German. Among these, the lucid and terse treatise
by E. Czuber, the Austrian actuary and mathematician, is
especially attractive to the beginner on account of the systematic
treatment of the whole subject.* A very original treatment is
offered by H. Bruhns in his "KoUektivmasslehre und Wahrschein-
lichkeitsrechnung" (Leipzig, 1903). Among the German works,
we may also mention the book by Dr. Norman Herz in "Samm-
lung Schubert," and an excellent little work by Hack in the small
pocket edition of "Sammlung Goschen." The theory of skew
curves and correlation is presented by Lipps and Bruhns in
extensive treatises.
1 "Almindelig lagttagelseslaere," Copenhagen, 1884.
« "Theory of Observations," London, 1903.
» "Mortalitat und MorbiUtåt," Jena, 1902.
*E. Czuber, "Wahrscheinlichkeitsrechnung," Leipzig, 1908 and 1910, 2
volumes.
Digitized by
Google
16 HISTORICAL AND BIBLIOGRAPHICAL NOTES. [11
We finally come to modern English writers on the subject.
After the appearance of de Moivre's "Doctrine of Chances"
the first work of importance was the book by de Morgan "An
Essay on the Theory of Probabilities." The latest text-book is
Whitworth's "Choice and Chance" (Oxford Press, 1904); but
none of these works, although very excellent in their manner of
treatment of the subject, comes up to the French, Scandinavian,
and German text-books. Nevertheless, son^e of the most im-
portant contributions to the whole theory have been made by
the English statisticians and mathematicians, Crofton, Pearson,
and Edgeworth. Especially have frequency curves and cor-
relation methods introduced by Professor Kari Pearson been
very extensively used in direct appUcations to statistical and
biological problems. Of purely statistical writers, we may
mention G. Udny Yule, who has pubUshed a short treatise en-
titled "Theory of Statistics" (London, 1911). Numerous ex-
cellent memoirs have also appeared in the different English and
American mathematical journals and statistical periodicals,
especially in the quarteriy pubUcation, Biometrika, edited by
Professor Kari Pearson.
In the above brief sketch, we have only mentioned the most
important contributors to the theory of probabilities proper.
Numerous able writers have written on the related subject of
least squares, the mathematical theory of statistics and insurance
mathematics. We shall not discuss the works of these inves-
tigators at the present stage. Each of the most important works
in the above mentioned branches will receive a short review in
the corresponding chapters on statistics and assurance mathe-
matics. The readers interested in the historical development of
the theory of probabilities are advised to consult the special
treatises on this subject by Todhunter and Czuber.^
^ After this chapter had gone to press I notice that a treatise by the emi-
nent English scholar, Mr. Keynes, is being prepared by The Macmillan Co.
In this connection I wish also to call attention to the recent publication by
Bachelier (Calcul des probabilites, 1912), a work planned on a broad and
extensive scale. — ^A. F.
Digitized by
Google
i
CHAPTER III.
THE MATHEMATICAL THEORY OF PROBABILITIES.
12. Definition of Mathematical Probability. — "If our positive
knowledge of the effect of a complex of causes is such that we
may assume, a priori, t cases as being equally Hkely to occur, but
of which only/, (f < t), cases are favorable in causing the event,
E, in which we are interested, then we define the proper fraction:
f/t = p as the mathematical probabiUty of the happening of
the event, E" (Czuber). We might also have defined an a
priori probability as the ratio of the equally favorable cases to
the co-ordinated possible cases.
As is readily seen, this definition assumes a certain a priori
knowledge of the possible and favorable conditions of the event
in question, and the probability thus defined is therefore called
"a priori probability." Denoting the event by the symbol, E,
we express the probability of its occurrence by the symbol P{E),
and the probability of its non-occurrence by P{E). Thus if Hs
the total number of equally possible cases and /the number of
favorable cases for the event, we have:
and
Prø = f=P,
P(£) = ^=l-{=l-p=l-P(£).
This relation evidently gives us: P{E) + P(E) = 1, which is the
symboUc expression for the hypothetical disjunctive judgment
that the event E will either happen or not happen. If / = <, we
have:
p(E) = |-i,
which is the symbol for the hypothetical judgment that if A
exists, E will surely happen. Similariy if / = 0, we get
8 17
Digitized by
Google
18 THE MATHEMATICAL THEORY OP PROBABILITIES. [13
P(£) = j=0,
or the symbol for the hypotheticaljudgment: If A exists, E will
not happen, or what is the same, E will happen.
As we have already mentioned, in an a priori determination of
a probability, special stress must be laid upon the requirement
that all possible cases must be equally likely to occur. The
emuneration of these cases is by no means so easy as may appear
at first sight. Even in the most simple problems where there
can be doubt about the possible cases being equally likely to
occur, it is very easy to make a mistake, and some of the most
eminent mathematicians and most acute thinkers have drawn
erroneous conclusions in this respect. We shall give a few ex-
amples of such errors from the literature on the subject of the
theory of probabilities, not on account of their historical interest
alone, but also for the benefit of the novice who naturally is ex-
posed to such errors.
13. Example 1. — ^An Italian nobleman, a professional gambler
and an amateur mathematician, had, by continued observation
of a game with three dice, noticed that the sum of 10 appeared
more often than the sum of 9. He expressed his surprise at this
to Galileo and asked for an explanation. The nobleman re-
garded the following combinations as favorable for the throw of 9:
12 6.
1 3 5
14 4
2 2 5
2 3 4
3 3 3
and for the throw of 10 the six combinations of:
1 3 6
1 4 5
2 2 6
2 3 5
2 4 4
3 3 4
Digitized by
Google
13] EXAMPLE 1. 19
Galileo shows in a treatise entitled " Considerazione sopra il
giueo dei dadi" that these combinations cannot be regarded as
being equally likely. By painting each of the three dice with
the different color it is easy to see that an arrangement such as
12 6 can be produced in 6 dififerent ways. Let the colors be
white, black and red respectively. We may then make the
following arrangements:
bite
> Black
Red
1
2
6
1
6
2
2
1
6
2
6
1
6
1
2
6
2
1
which gives 31 = 6 dififerent arrangements. The arrangements
of 1 4 4 can be made as follows:
White Black Red
14 4
4 1 4
4 4 1
which gives 3 dififerent arrangements. The arrangements of
3 3 3 can be made in one way only. By complete enumeration
of equally favorable cases we obtain the following scheme:
Sum 9 cases Sum 10
1,2,6
6
1,3,6
6
1,3,5
6
1,4,5
6
1,4,4
3
2,2,6
3
2,2,5
3
2,3,5
6
2,3,4
6
2,4,4
3
3,3,3
1
3,3,4
3
25
27
The total number of equally possible cases by the dififerent ar-
rangements of the 18 faces on the dice is 6' = 216. The prob-
ability of throwing 9 with three dice is therefore jW, of throwing
10 = ijVff = i.
Digitized by
Google
20 THE MATHEMATICAL THEORY OF PROBABILITIES. [14
14. Example 2. — D'Al^mbert, the great French mathematician
and natural philosopher and one of the ablest thinkers of his
time, assigned f as the probability of throwing head at least
once in two successive throws with a homogeneous coin. D'AI^
IfKtnbert reasons as follows: If head appears first the game is
finished and a second throw is not necessary. He therefore gives
as equally possible cases (we denote head by H and tail by T) :
H, TH, TT, and determines thus the probabiUty as f . Where
then is the error of D'Alambert? At first glance the chain of
reasoning seems perfect. There are altogether three possible
cases of which two are in favor of the event. But Site the three
cases equally likely? To throw head in a single throw is evi-
dently not the same as to throw head in two successive throws.
D'Alambert has left out of consideration the fact that a double
throw is allowed. The following analysis shows all the equally
possible cases which may occur:
HH, HT, TH, TT.
Three of those cases favor the event. Hence we have:
PiE) = p = l
We shall return to this problem at a later stage under the dis-
cussion of the law of large numbers.
The examples quoted have already shown that the enumer-
ation of the equally likely cases requires a sharp distinction
between the different combinations and arrangements of ele-
ments. In other words, the solution of the problems requires
a knowledge of permutations and combinations. We assume
here that the reader is already acquainted with the elements and
formulas from the combinatorial analysis and shall therefore
proceed with some more illustrations. In the following, when
employing the binomial coeflScients, we shall use the notation
I , j instead of '"C*.
IS. Example 3. — An urn contains a white and b black balls. A
person draws k balls. What is the probability of drawing a
white and j3 black balls?
(a + /3 =; i, a^a, /3 ^ 6)
Digitized by
Google
15] t EXAMPLE 3. 21
k balls may be drawn from the urn in as many ways as it is possible
to select k elements from a + 6 elements, which may be done in
ways. Furthermore there are I j groups of a white and 1^1
groups of j3 black balls. Since each combination of any one
group of the first groups with any one group of the second groups
is favorable fof the event, we have as favorable cases:
O AD
H>0- - ^(«=
Example 4. A specialcase of the above problem is the fol-
lowing question which often appears in the well known game of
whist. What are the respective chances that 0, 1, 2, 3, 4 aces
are held by a specified player? There are altogether 52 cards
in the game equally distributed among 4 players. Of these
cards 4 are aces and 48 are non-aces. Hence we have the fol*
lowing values for a, b, k, a and j8.
a = 4, 6 = 48, ft = 13, a = 0, 1, 2, 3, 4, j8 = 13, 12, 11, 10, 9.
Substituting in the above formula we get:
r4\_/48\ /52\ 82251
-'{tMZ)Hn)'
^ \l)^U2/ ■ \13/~ 270725
/52\ 57798
I 13/ 270725
/52\ 11154
US/. 270725
r52v 715
-(l)x(?)
270725
r52\ 118807
\13/ 270725*
A hypothetical disjunctive judgment immediately tells us that in
Digitized by
Google
22 THE MATHEMATICAL THEORY OF PROBABILITIES. [IS
a game of whist a specified player must either hold 0, 1, 2, 3 or
4 aces. Any such judgment is certain to come true. Hence by
adding the 5 above computed probabilities we obtain a check
for the accuracy of our calculations. The actual addition of the^
numerical values of po, pi, p2, pz, and p4 gives us unity which is
the mathematical symbol for certainty. Gauss, the renowned
German mathematician and astronomer, was an eager whist
player. During his forty-eight years of residence in the university
town of Gottingen almost every evening he played a rubber of
whist with some friends among the university professors. He
kept a careful record of the distribution of the aces in each
game. After his death these records were found among his
papers, headed "Aces in Whist." The actual records agree
with the results computed above.
16. Example 5. — ^An urn contains n similar balls. A part of
or all the balls -are drawn. What is the probability of drawing
an even number of balls?
One ball may be drawn in as many ways as there are balls,
two balls in as many ways as we may select two elements out of
n elements, and so on. Hence we have for- the total number of
equally possible cases:
'= (D + C) + C) +•••+<+ o •
IVe have now:
.and
a-')-'-{i)+{")--+(- "•(:)■
'The number of favorable cases is given by the expansion:
f-(i)+{:)+
The expression for t is the binominal coefficients less unity.
Hence we have:
< = (1 + l)n ^ 1 = 2^ - 1.
If we add the two expansions of (1 + 1)"* and (1 — 1)" and then
Digitized by VjOOQ IC
17] EXAMPLE 6. 23
subtract 2 we get the expansion for 2/. Hence we have:
2/=[(l + l)'*+(l-ir-2] .•./=2«-i-l.
Thus we shall have as the probability of drawing an even number
of balls:
2n-i _ 1
while for an uneven number:
2n-i
9=1-P = 2^rzri-
We notice that the probability of drawing an uneven number of
balls is larger than the probability of drawing an even number.
This apparently strange result is easily explained without the
aid of algebra from the fact that when the urn contains one ball
only, we cannot draw an even number. Hence we have p = 0,
g = 1. With two balls we may draw an uneven number in two
ways and an even number in one way, thus p = J, and q = i.
The greater weight of q remains when n is finite; only when
n = 00, p =z q =: ^,
17. Example 6. — A box contains n balls marked 1, 2, 3, • • • n.
A person draws n balls in succession and none of the balls thus
drawn is put back in the urn. Each drawing is consecutively
marked 1, 2, 3, • • • n on n cards. What is the probability that
no ball marked a (a = 1, 2, 3, • • • n) appears simultaneously
with a drawing card marked a?
The number of equally possible cases is simply the number of
permutations of n elements which is equal to n!
The number of favorable cases is given by the total number
of derangements or relative permutations of n elements, i. e.,
such permutations wherein the numbers from 1 to n do not appear
in their natural places. The formula for such relative permuta-
tions was first given by Euler in a memoir of the St. Petersburg
Academy entitled "Quaestio Curiosa ex Doctrina Combina-
tionis." Euler makes use of a recursion formula. A German
mathematician, Lampe, has, however, derived the formula in a
simpler manner in "Grunert's Archives" for 1884.
Digitize^ by VjOOQ IC
24 THE MATHEMATICAL THEORY OF PROBABILITIES. [17
Lampe denotes by the symbol ^(1) the number of permuta-
tions wherein 1 does not appear in its natural place. By letting 1
remain fixed in the first place we obtain (n — 1) 1 permutations of
the other remaining elements, or:
^(l)„ = nl-(n-l)I
permutations where 1 is out of place. Of these permutations
there are, however, a number wherein 2 appears in its natural
place. If we let 2 remain fixed in this place we shall have:
permutations wherein 2 is in its place but 1 out of place, there
remains thus:
<p{2)n = <p{l)n - <p{l)r^i = n! - 2(n - 1) I + (n - 2) I
permutations in which neither 1 nor 2 is in its natural place.
Letting 3 remain fixed in its place, the remaining n — 1 elements
give:
(n- l)I-2(n-2)I+(n-3)I
cases where 3 is in its place but 1 and 2 are not. Accordingly
there will be:
v?(3)„ = v?(2)„ - ^(2 Vi = nl - 3(n - 1) I + 3(n - 2) I - (n-3) I
permutations in which none of the three elements 1, 2, and 3 is
in its place. The complete deduction gives us now for the
number r:
^(r)»=nI-(j)(n-l)l+(2)(n-2)!
arrangements in which none of the numbers 1, 2, 3, • • • r is in
its place. Hence the required probabiUty is:
^^^^ "" \l/n"''\2/n(n-
(n-1)
(n - 1) . . . (n - r + 1) '
Google
Digitized by'
it] EXAMPLE 6. 25
when n = r the above expression becomes:
or the probability that none of the balls appear in its numerical
order.
When n = oc the above expression converges towards e"^ as
a limit. Since the series is rapidly convergent, we may therefore
as an approximate value let
p=6-i = 0.36788....
The probability that at least one ball appears in numerical order
is
g= 1 -p = 0.63213 ....
Digitized by
Google
CHAPTER IV.
THE ADDITION AND MULTIPLICATION THEOREMS IN
PROBABILITIES.
18. Systematic Treatment by Laplace. — ^The reader will readily
have noticed that the problems hitherto considered have been
solved by a direct application of the fundamental definition of a
mathematical probability. Almost every branch of pure and
applied mathematics has originated in this manner. A few
isolated problems, apparently having no mutual connection what-
soever, have presented themselves to different mathematicians.
As the number of problems increased, there was found to exist
a certain inner relation between them, and from the mere isolated
cases there grew a systematic treatment of an entirely n%w
subject.
The theory of probabilities had its origin in games; and the
different problems that arose, were treated individually. From
the time of Galileo and Cardano to the appearance of Laplace's
great treatise, a number of celebrated mathematicians such as
Pascal, Fermat, Huyghens, De Moivre, Stirling, Bernoulli and
others had solved numerous problems, some of these, as we already
have seen in the preceding chapter, of a quite complex nature.
But none of these mathematicians had hitherto succeeded in
giving a systematic treatment of the subject as a whole. All
their treatises were, as any one taking the trouble to look over
the works of De Moivre and Bernoulli will readily notice, mere
collections of examples solved by direct application of our funda-
mental definition. It remained for Laplace first to give the
definite rules to the science by which the solution of a great
number of problems, often very complicated, was reduced to
the application of a few stable principles, first given in his
"Tbéorie Analytique des Probabilités" (Paris, 1812).
19. Definition of Technical Terms. — Before entering into a
demonstration of Laplace's theorems it will, however, be neces-
sary to explain a few technical terms which seem conmionplace
26
Digitized by
Google
20] THEOREM OF COMPLETE OR TOTAL PROBABILITY. 27
and simple enough but which, nevertheless, must be defined
clearly in order to avoid any ambiguity.
In all works on probabilities when speaking of happenings of
various events we encounter often the terms, independent events,
dependent events and mviually exclusive events. An event E is
said to be independent of another event F when the actual
happening of F does not influence in any degree whatsoever the
probability of the happening of E. On the other hand, if the
probability of E is dependent on or influenced by the previous
happening of F, then E is said to be dependent on F. Finally the
two events E and F are said to be mutually exclusive when
through the occurrence of one of them, say F, the other event
E cannot take place, or vice versa. We might also in this case
consider the two events E and F as members of a complete dis-
junction. In a complete hypothetical disjunctive judgment as
"When a die is thrown either 1, 2, 3, 4, 5 or 6^ will turn up''
each member represents a possible event. Any one of these
events is mutually exclusive in respect to the other events of the
disjimction.
20. The Theorem of the Complete or Total Probability^ or the
Probability of " Either Or." — When an event, E, may happen in
any one of the n different and mutually exclusive ways Ei, E2,
£3, • • • En with the respective probabilities: pi, p2, Pzy • • • Pn,
then the probability for the happening of the event, E, is equal
to the sum of the individual probabilities: pi, p2, pa, • • • Pn.
Proof: The main event, E, falls in n groups of subsidiary events
of which only one can happen in a single trial but of which any
one will bring forth the event E. Let us by t denote the total
number of equally possible cases. Of these possible cases / are
in favor of the event. This favorable group of cases may now
be divided into n sub-groups of which /^ are favorable for the
happening of Ei, f^ in favor of E2, fz in favor of -Bs • • • /n in
favor of En. When we write:
P(E)-p-^- I =7 + 7
Digitized by
Google
28 THE ADDITION AND MULTIPLICATION THEOREMS. [21
Each of the fractions fjt (a = 1, 2, 3, • • • n) represents the
respective probabilities for the actual occurrence of the subsidiary
events, Eu E2, Ez, • • • En. Hence we shall have
P{E) = p== P1 + P2 + PS+ • • • + Pn.
This theorem is also known as the Addition Theorem of proba-
bilities. Instead of "total probabiUty*' the German scholar,
Reuschle, has suggested the expressive name of the "either or"
probability. The term is well selected when we remember that
the event, E, will happen when either Ei, or £2 or Es • • • or E,
happens.
Example 7. — ^What is the probability to throw 8 with two dice
in a single throw?
The total number of ways is < = 6^ = 36. The event in ques-
tion E is composed of the three subsidiary events favoring the
combination of 8:
Ei: 6, 2
E2: 5, 3
JBs: 4, 4.
n
Now
^(^^^ === 36 " 18' -^^^'^ " 36 " 18' ^^^'^ ^ 36'
Hence
ryr^j 18"^ 18"^ 36 36'
21. Theorem of the Compound Probability or the Probability
of " As Well As." — ^An event E may happen when every one of
the mutually exclusive events Ei, E2, Ez, • • • En has occurred
previously. It is immaterial if the n subsidiary events have
happened simultaneously or in succession. But it makes a
difference if the events Ei, E2, Ez, • • • En are independent, or
dependent on each other.
1. Independent Events. — ^The probability, P(E) = p, for the
simultaneous or consecutive appearance of several mutually ex-
clusive events: Ei, E2, • • • En is equal to the product: pi*P2'P3*
• • • Pn of the individual probabilities of the n events.
Proof: Let the number of possible cases entering into the
complex that brings forth the event E be t. Each of the ^1
Digitized by
Google
21] THEOREM OP THE COMPOUND PROBABILITY. 29
possible cases corresponding to the event Ei may occur simul-
taneously with each one of the fe cases corresponding to the event
, £2. Thus we have altogether hXU cases falUng on Ei and E2
at the same time. Continuing in the same way of reasoning it
is readily seen that the total number of equally possible cases
resulting from the simultaneous occurrence of the events Ei, E2,
Ezy • • -En is equal to <i X fe X <3 X • • • U- By applying the same
reasoning to the favorable cases we get as their total number:
/ = /iX/2X/3X •••/n.
Hence the final probability for the happening of the simultaneous
or. consecutive appearance of the n minor events is:
Example 8. — ^A card is drawn from a whist deck, another card
is drawn from a pinochle deck. What is the probability that
they both are aces?
A whist deck contains 52 cards of which four are aces, a
pinochle deck 48 cards with 8 aces. Denoting the probabilities
of getting an ace from the whist and pinochle decks by P{E{)
and P(jB2) respectively we have:
p(E) = P(£;oP(£.) = |x|=^.
2. Dependent Events. — ^The n events Ei, E2, Ez, • • • En are
not independent of each other, but are related in such a way that
the appearance of Ex influences E2, that event influences in turn
Ez, Ei event E^ and so on.
The same reason holds as above, and,
P{E) = p = Pl X P2 X P3 X • • • Pn.
But p2 means here the probabiUty for the happening of E2 after
the actual occurrence of Ei, Pz the probability for the happening
of Ez after Ei and E2 have previously happened, and so on for
all n events. t
Example 9. — ^A card is drawn from a whist deck and replaced
by a joker, and then a second card is drawn. What is the prob-
ability that both cards are aces?
Digitized by
Google
30 THE ADDITION AND MULTIPLICATION THEOREMS. [22
Denoting the two subsidiary events hy Ei and E2 we have:
P(E) = P(EOP(^.) = |.| = ^=A.
The two above theorems are known as the multiplication theorems
in probabilities. Reuschle has also suggested the name "the
as well as probability.'*
22. Poincare's Proof of the Addition and Mtdtiplication
Theorem. — ^The French mathematician and physicist, H.
Poincaré, has derived the above theorems in a new and elegant
manner in his excellent little treatise: " Lecons sur le Calcul des
Probabilités," Paris, 1896.
Poincaré 's proof is briefly as follows:
Let El and E2 be two arbitrary events.
El and E2 may happen in a difiFerent ways.
El may happen but not E2 in j8 different ways.
E2 may happen but not Ei in 7 different ways.
Neither Ei nor E2 will happen in S different ways.
We assume the total a + jS + 7 + 8 cases to be equally likely to
occur.
The probability for the occurrence of Ei is
^' a + ^ + 7 + 8*
The probability for the occurrence of E2 is
_ a + 7
^' a + P + y + S'
The probability for the occurrence of at least one of the events Ei
and £2 is
oc + P + y
^' a + P + y + S'
The probability for the occurrence of both Ei and E2 is
a
P'-a + P + y + d'
The probability for the occurrence of Ei when E2 has already oc-
curred is
a •
a + 7'
Digitized by
Google
23] BELATIYE PROBABILITIES. 31
The probability for the occurrence of E2 when Ei has ateady oc-
' curred is
The probability for the occurrence of Ei when E2 has not already
occurred is
The probability for the occurrence of E2 when Ei has not ahready
^ occurred is
y
f We have now the following identical relations:
^ Pl + P2 = Pi + P4, Ps = Pl + P2 — P4,
i. e., the probability that of two arbitrary events at least one
will happen is equal to the probability that the first will happen
! plus the probability that the second will happen less the prob-
f ability that both will happen. The particular problem which
* we niay happen to investigate may possibly be of such a nature
that the two events Ei and E2 cannot happen at the same time,
in that case p* = 0, and we get:
Pz = Pi + I^
In this equation we immediately recognize the addition theorem
for two. mutuaUy exclusive eventa. By substitution of the
proper values we have furthermore:
^ P4 = P2 • P6 or p4 = pi • pe-
These equations contain the theorems proved under § 21, of
the probability for two mutually dependent events.
23. Relative Probabilities. — ^We shall now finally give an alter-
native demonstration of the same two theorems. It will, of
course, be of benefit to the student to see the subject from as
many view points as possible; moreover, the following remarks
will contain some very useful hints for the solution of more com-
plicated problems by the application of so-called " relative prob-
Digitized by
Google
32 THE ADDITION AND MULTIPLICATION THEOREMS. [23
abilities "and a few elementary theorems from the calculus of
logic. The following paragraphs are mainly based upon a
treatise in the Proceedings of the Royal Academy of Saxony,
by the German mathematician and actuary, F. Hausdorff,
In our fundame^tal definition of a mathematical probabiUty
for the happening of an event E, expressed in symbols by P{E),
as the ratio of the equally favorable and equally possible cases
resulting from a general complex of causes, we were able to
compute the so-called ordinary or absolute probabiUties. But
if we, from among the favorable cases and possible cases, select
only such as bring forward a certain different event, say F, then
we obtain the " relative probability " for the happening of E
under the assumption that the subsidiary event, F, has occurred
previously. For this relative probability we shall employ the
symbol Pf(E), which reads "the relative probability of E,
positi F" The following problem illustrates the meaning of
relative probabilities. If an honor card is drawn from an
ordinary deck of cards, what is the probability that it is a king?
Denoting the subsidiary event of drawing an honor card by F,
and the main event of drawing a king by E, we may write the
above mentioned probability in the symbolic form: PAE). If
on the other hand we knew a priori that a king was drawn, we
may also ask for the probability of having drawn an honor card.
Since any king also is an honor card, we may write in symbols:
PÅF) = l.
Before entering upon the immediate determination of relative
probabilities we shall first define a few symbols from the calculus
of logic. We denote first of all the occurrence of an event E
by E, the non-occurrence of the same event by E. Similarly
we have for the occurrence and non-occurrence of other events,
F, G, H, • • • arid F, G, H, -". E+ F means that at least one
of the two events E and F will happen. E X F or simply E • F
means the occurrence of both E and F. Fro m the above
definition it follows immediately that E + F = E ' F and
E = E ' F+E ' F.
This last relation simply states that E will happen when either
E and F happen simultaneously or when E and the non-appear-
ance of F happen at the same time. If furthermore Fi, Fi, Fi,
Digitized by
Google
25] PBOBABIUTY OF BEPETITIONS. 33
F2 '•' Fn, Fn constitute the members of a complete disjunction,
i. e,, mutually exclusive events, we have in general:
£= E ' Fi-\' E • Fi-^- E • F2'{' E • F2'{' • • • E • Fn "f* E • Fn*
From the original definition of a probability, it follows now:
P{E) = P{E' F) + P{E . F),
and
P{E) = P{E . Fi) + P{E . Fi) + P{E . F2) + P{E • F2)
+ P{E'Fn) + P{E'Fn),
i. e., the probability that of several mutually exclusive events
one at least will happen is the sum of the probabilities of the
happening of the separate events. This is the symbolic form for
the addition theorem.
24. Multiplication Theorem. — ^We next take two arbitrary
events. From these events we may form the following com-
binations: ^ _
E ' F, E ' F, E ' F, E ' F,le.,
Both E and F happen, ^
E happens but not F
F happens but not E
Neither E nor F happens.
Furthermore let a, j8, 7, 6, be the respective numbers of the
favorable cases for the above four combinations of the events
E and F. Following the previous method of Poincaré, we shall
have:
Pi^-a + fi + y + S^ ^(^-a + ^ + 7 + 8'
a + y' ^'^^' a + fi'
P{E'F)^ ,^^ .. .
a+P+y+S
25. Probability of Repetitions. — ^From the above equations it
immediately follows:
P(E . 20 = P(E) X Pe{F) = P{F) X Pf{E),
which is the symbolic form for the multiplication theorems of
compound probabilities.
Digitized by
Google
y
34 THE ADDITION AND MULTIPLICATION THEORBBiS. [25
In special cases it may happen that the different subsidiary
events: E\y E2, Ez "' En are all similar. We shall then have, *"
following the symbolic method:
E = El • E2 • Ez • • • En = El • El • El • • • Ei = JEri*,
and
P{E) = P{Ei^) = P{Ei)\
This gives us the following theorem:
The probability for the repetition n times of a certain event,
E, is equal to the nth power of its absolute probabihty.
Thus if P{E) = p we have immediately P{E) = 1 - p.
PiE"") = PiE)"" = p\
P(^'») = PiE)"" = (1 - p)\
Thus the probability for the occurrence of E at least once in
n trials is
P{E+E+ ••. ntimes) = 1 - P(£») = !-(!- p)\
Denoting the numerical quantity of this probability by Q we
have:
1 - e = (1 - p)^
Solving this equation for n we shall have:
log (1 - Q)
n =
log (1 - p) *
Whenever n equals, or is greater than, the above logarithmic
value for given values of Q and p we are sure that Q will exceed
a previously given proper fraction. To illustrate:
Example 10. — ^How often must a die be thrown so that the
probability that a six appears at least once is greater than J?
Here p = h Q ^ h Hence we must select for n the smallest
positive integer satisfying the relation:
log (1 ~ i) _ logj _ .301035 .
. log (1 - i) " log f " .079186 '• ^'^ ""■"*•
For this particular value of n we have in reality:
e = 1 - (I)' = .518.
Digitized by
Google
-v-
lo^ A ^
U^ i
271 ^ EXAMPLE 12. 36
26. Application of the Addition and Multiplication Theorems
, in Problems in Probabilities. — We shall next proceed to illustrate
the theorems of the preceding paragraphs by a few examples.
First, we shall apply the demonstrated theorems to some of the
examples we have already solved by a direct application of the
fundamental definition of a mathematical probability.
Example 11. — ^We take first of all our old friend, the problem
of D'Alambert. What is the probability of throwing head at
least once in two successive throws with an uniform coin?
This problem is most easily solved by finding the probability
first for not getting head in two successive throws. By the
multiplication theorem this probability is: p = i X i = J.
Then the probability to get head at least once is 1 — i = f
from a simple application of the rule in § 25. A more lengthy
analysis is as follows. Denoting the event by E, the following
cases may appear which may bring forth the desired event:
Head in first throw which we shall denote by Hi and head in
second throw which we denote by H2, or head in first throw (-ff 1)
and tail in second (72), or finally tail in first (Ti) and head in
second {H2)* Then we have:
E = Hi • H2 "h Hi • i 2 "f" xi • H2f
or:
P{E) = P{Hi) • P{H2) + P{Hi) . P{T2) + P(Ti) . P(ff2)
27. Example 12. — ^What is the probability of throwing at
least twelve in a single throw with three dice? The expected
event occiu^ when either 12, 13, 14, . . . or 18 is thrown. Of
• these events only one may happen at a time. We may, there-
fore, apply the addition theorem and obtain as the total prob-
ability:
p — P12 + Piz + Pu+ • • • + Pis.
where pi2, Pis, "'Pis are the respective probabilities for throwing"
the sums of 12, 13, • • • or 18. These subsidiary probabilities
were determined in § 13 under the problem of Galileo, and:
Digitized by VjOOQ IC
36 THE ADDITION AND MULTIPLICATION THEOREBiS. [28
28. Example 13. — An urn contains a white, b black and c red
balls. A single ball is drawn a + jS + 7 times in succession,
and the ball thus drawn is replaced before the next drawing takes
place. To determine the probability that (1) there are first
a white, then j8 black and finally 7 red balls, (2) the drawn balls
appear in three closed groups of a white, jS black and 7 red balls,
but the order of these groups is arbitrary, (3) that white, black
and red balls appear in the same number as above, but in any
order whatsoever.
1. Denoting the three subsidiary events for drawing a white,
j8 black and 7 red balls by Fi, F2 and Fz, and the main event for
drawing the balls in the prescribed order by E, we may write the
probability for the occurrence of the main event in following
symbolic form involving symbolic probabiUties:
P{E) = P{F{)Pr,{F,)Pr,F,(Fz).
Substituting the algebraic values for P(Fi), PiF^) and P{Fz)
in the expression for P(-E), and then applying Hausdorff 's rule
(§24) we get:
a* V c^
^ ^^ " ^ " (a + 6 + c) • ^ (a + 6 + c) ^ ^ (a + 6 + c) ^
(a+6 + c)*+^+y*
2. In the second part of the problem the order of the three
<iififerent groups is immaterial. The three subsidiary events:
Fi, F2 and Fz, may therefore be arranged in any order whatsoever.
The total number of arrangements is 31 = 6. The probability
*of the happening of any one of these arrangements separately
is the same as the probability computed under (1). By applying
^ the addition theorem we get therefore as the probability of the
•occurrence of this event:
___6ar¥f__
3. The third part is more easily Solved by a direct application
of the definition of a mathematical probability. The order of
the balls drawn is here immaterial. Of each individual corn-
Digitized by
Google
30] EXAMPLE 15. > 37
bination of a white^ j9 black balls and y ^ balls it is possible
to form (a + /3 + 7) I/a 1/3 17 1 different permutations as the total
number of favorable cases. The above number of equally pos-
sible cases is here {a + b + c)*"''^"*'^. Hence we have:
(a + j8 + 7)I^ a^hf'c'
P»= TmJi X
29. Example 14. — ^In an urn are n balls among which are a
white and /3 black. What is the probability in three successive
drawings to draw (1) first two white and then one black ball, (2)
two white and one black ball in any order whatsoever? (a+j8^n).
The probability to draw first one white, then another white and
finally a black ball is:
The probability for any of the other arrangements is the same>
or we have for (2)
„ 3« (g - 1) ,, |8
30. Example 15. — ^What is the chance to throw a doublet of
6 at least once in n consecutive throws with two dice? (Pascal's
Problem.)
Chevalier de Mere, a French nobleman and a great friend of all
games of chance, went more deeply into the complex of causes in
different games than most of the ordinary gamblers of his time.
Although not a proficient mathematician he understood suffi-
cient, nevertheless, to give some very interesting problems for
which he got the ideas from the gambling resorts he frequented.
De Mere was a friend of the great French mathematician and
philosopher, Blaise Pascal, and went to him whenever he wanted
information on some apparently obscure point in the different*
games in which he participated. The chevalier had from patient
observation noticed that he could profitably bet to throw a six
at least once in four throws with a single die. He reasoned now
that the number of throws to throw a doublet at least once with
two dice ought to be proportional to the corresponding equal
number of possible cases with a single die. For one die there are
Digitized by
Google
38 THE ADDITION AND MULTIPLICATION THEOREMS. [31
6 possible cases, for two 36. Thus de Mere thought he could
safely bet to throw a doublet of 6 in 24 throws with two dice.
An actual trial by several games of dice proved extremely
disastrous to the finances of the nobleman, who then went to
Pascal for an explanation. Pascal solved the problem by a direct
application of the definition of a mathematical probability. We
shall, however, solve it by an application of the multiplication
theorem.
The probability to get a doublet of 6 in a single throw is ^.
The probability of not getting a double six is therefore 1 — g^g
= II . The probability of the happening of this event n con-
secutive times is (ff )**. Thus the probability of getting a double
six at least once in n throws with two dice becomes: p = 1 —
(ff )"• Solving this equation for n we shall have:
log (1 - p)
n =
for p = i we shall have:
log 35 -log 36'
tog 2
^"log36-log35-24.6--..
First for 25 throws we may bet safely one to one while for 24
throws such a bet was unfavorable. This shows the fallacy of
de Mere's reasoning.
31. Example 16. — ^An urn, -4, contains a balls of which a are
white, another similar urn, B, contains b balls of which j8 are
white. A single ball is drawn from one of the two urns. What
is the probability that the ball is white? The beginner may easily
make the following error in the solution of this problem. The
probability to get a white ball from A is a/a, from B, fi/b. Thus
the total probability to get a white ball is: a/a + fi/b. This
result is, however, wrong, for we may, by selecting proper values
for a, b, a and j8, obtain a total probability which in numerical
value is greater than imity. Thus if a = 7, 6=7, a = 5,
/3 = 4, we get as the total probability:
This result is evidently wrong, since a mathematical probability
is never an improper fraction. The error lies in the fact that we
Digitized by
Google
32] EXAMPLE 17. 39
have regarded the two events of drawing a ball from either urn
as independent and mutually exclusive. A simple application
of the symbolic rule for relative probabilities will give us the
result inmiediately. The main event, E, is composed of the two
following subsidiary events: (1) to get a white ball from A, or
(2) to get a white ball from B. We shall symbolically denote
these two events hy A ' W and B • W respectively. Thus we
have:
P{E) = P{A'W) + P{B'W) = P{A)P^{W) + P(B)Ps{W).
Now the probability to obtain urn A is P(A) = pi = J, also to
get B: P{B) = P2 = h The probability to get a white ball
from A when this particular urn is previously selected is expressed
by the relative probability:
Similarly for £: •
Substituting these different values in the expression for P(E)
we get finally:
PW = , = lxf + |xf-i(f+f).
For the particular numerical example we have:
1/5, 4\ 9
32. Example 17.— The probability of the happening of a
certain event, E, is p, while the probability for the non-occurrence
of the same event is g = 1 — p. The trial is now to be repeated
n times. The probability that there will be first a successes
and then j3 failures is:
P(E^)Ps^ (&) = p-.qP(a + p== n).
This is the probability that the two complementary events E and
E happen in the order prescribed above. When the order, in
which the successes and failures happen, plays no r61e during
the 71 trials, that is to say it is only required to obtain a successes
Digitized by
Google
40 THE ADDITION AND MULTIPLICATION THEOREMS. [33
and P failures in any order whatsoever in n total trials, then the
arrangement of the a factors p and /3 factors q is immaterial. The
total number of arrangements of n elements of which a are equal
to p and j8 equal to q is simply n I/(a I X j8 1) . For any one particu-
lar arrangement of a factors p and j8 factors q the probability of
the happening of the two complementary events in this particular
arrangement is equal to p* • g^. The Addition Theorem im-
mediately gives the answer for a successes and /3 failures in any
order whatseover as:
P(E*.£^) = P.= (^)pV-.
Let us, for the present, regard this probability as being a function
of the variable quantity, a, (n being a constant quantity). We
may then write:
Pa = <p(pd.
Letting a assume all positive integral values from to n the
above expression for p. becomes:
Po=(Q)p®-g", pi=(i)p-9*"S
These are the respective probabilities for no successes, one success,
two successes, . . . and finally n successes in n trials. The
above quantities are, however, merely the different members of
the binomial expansion (p + q)"*. Since p + g = 1 from the
nature of the problem, we also have (p + g)** = 1, or po + pi
+ P2 + • ' • + Pn = 1. This last equation is the symbolic
form for the simple hypothetical disjunctive judgment: E must
happen either 0, 1, 2, • • • orn times in n total trials. We shall
return to this problem later under the discussion of the Bernoullian
Theorem. In fact, the above example constitutes an essential
part of this famous theorem which has proven one of the most
important and far reaching in the whole theory of probability.
33. Example 18^ De Moivre's Problem.— The following prob-
lem was first given by the eminent French-English mathemati-
Digitized by
Google
33J BXAMPLB 18. él
cian^ Abraham de Moivre, in a treatise, entitled '^ De Mensura
Sortis," which was published in London about 1711.
An urn contains n + 1 balls marked 0, 1, 2, • • • n. A person
makes i drawings in succession, and each ball is put ba<^k in the
urn before the neict drawing takes place. What is the probability
that the sum of the numbers on the ^ balls thus drawn equals ^?
The first ball may be drawn in n + 1 ways, the second ball
may also be drawn in n + 1 ways. Hence two balls may be
drawn in (n + ly ways or i balls in (n + 1)* ways: This is the
total number of equally possible cases.
If we expand the expression:
{aP+a^-^x^+ix? + ai^+^" a;»)< (1)
after the multinominal theorem, we notice that the coefficient
to 3f arises out of the diflFerent ways in which 0, 1, 2, 3, • • • n
can be groj^ped together so as to form s by addition, which also
is the total number of favorable cases. The expression (1)
inside the bracket represents a geometrical progression, which
may be written as:
(1 - a:»+W - x)-*' = 1 1 - ix»+i + (2)^^"^' - (3) ^"^
+ ...}x{l + ^.+ (^•+l)a^+f+2^a^+..J.
By actual multiplication we get a power series in x. The terms
containing 3f are obtained in the following manner: the first term
of the first factor being multiplied with the term
I j af of the second factor,
the second term of the first factor multiplied with the term:
- I af~^^^ of the second factor,
the third term of the first factor multiplied with the term:
■ _ 2 — 2 ) ^'^"^ of the second factor.
Digitized by
Google
42 THE ADDITION AND MULTIPLICATION THEOBEMS. [34
Thus the coefficient of of is equal to
rr')-(;)C'::;:Tn
+ I2M s-2n-2 ) •
The above expression may by further reductions be brought to
the form:
{s+l)(s + 2)"'is+i- 1)
1 . 2 ... (i - 1)
-(i)
i\ (s — n)(a — n + 1) . . • (» — n + i — 2)
1 . 2 ... (t - 1)
+a)
i V (^ — 2n — 1)(^ — 2n) " • (« — 2n + t — 3)
1 . 2 ... (i - 1)
The series breaks of course as soon as negative factors appear
in the numerator. The required probability is therefore
' (n+l)M
(*+!)(* + 2). ..(*+i-l)
1 . 2 ... (i - 1)
/i\ (s— n)(s — n+l) ■•■ (s — n + i — 2) , - 1
~\l) 1 . 2 . . . (i - 1) +-•••}•
34. Example 19. — If a single experiment or observation is
made on n pairs of opposite (complementary) events, E^ and E^
with the respective probabilities of happening p^ and q^ (a = 1,
2, 3, • • • n), to determine the probability that: (1) exactly r,
(2) at least r of the events E^ will happen.
This problem is of great importance, especially in life assurance
mathematics. It happens frequently that an actuary is called
upon to determine the probability that exactly r persons will be
alive m years from now out of a group of n persons of any age
whatsoever, each person's age and his individual coefficient of
survival through the period being known beforehand.
Various demonstrations have been given of this problem. The
first elementary proof was probably due to Mr. George King,
Digitized by
Google
34] BXAMPLB 19. 43
the English actuary, in his well-known teirt-book. The Austrian
mathematician and actuary, E. Czuber, has simplified King's
method in his " Wahrscheinlichkeitsrechnimg " (1903). Later
the Italian actuary, Toja, has given an elegant proof in BoUetino
degli AUuari, Vol. 12. Finally another Italian mathematician,
P. Medolaghi, has investigated the problem from the standpoint
of symbolic logic. In the following we shall adhere to the demon-
stration of Czuber and also give a short outline of the symbolic
method.
In order to answer the first part of the problem we must form
all possible combinations of r factors of p and n — r factors of q
and then sum all such combinations of n factors. Denoting
the event by Eir] we have:
= ^PJPfi • • • (1 - Pa)(1 - p.) • • • (1 - P«).
We shall now denote the sum of all products in (1) containing <p
factors p by the symbol S^.. It is readily seen that <p will have
all positive integral values from r to n inclusive. We may
therefore write the total compoimd probability in the following
form:
P(£m) = AoSr + AiSr^i + A2Sr^2 +'-+ Ar^Sn. (2)
The student must bear in mind that the different S are merely
symbols for different sums of all the products of r, r + 1, r + 2,
• • • n factors p respectively. Our problem is now to determine
the unknown coeflScients A. It is easily seen that the coeflScient
Aq = 1, since all different products containing r factors p appear
only once. The other coeflBcients of the form A do not depend
on the values of p, however. They remain therefore unaltered
if we equate all of the various p's and let them equal p. Ex-
pression (1) then simply becomes I j • p*'(l — p) **"*■. We must
form all possible rth powers of n similar factors, which can
be done in I j ways. The expression (2) on the other hand
Digitized by
Google
44 THE ADDITION AND MULTIPLICATION THEOREM. [34
becomes:
Any S^ is bj^ definition the sum of all products containing <p
factors p and we may form I j such products from n elements
p. But we saw above that (p might only have all positive values
from r to 71 inclusive, hence expression (2) will naturally take
the above form. We have therefore
Expanding the expression on the left hand side by means of the
binomial theorem and equating the coeflBcients of equal powers
of p, we get:
or:
Substituting these values in (2) for the unlmown coefficients A,
we shall have:
P(E,„) = S, -('■}■ ')Shi+ Ct^)*«
Digitized by
Google
34] EXAMPLE 19. éB
If we expand the algebraic expression:
we have:
+ <-l>"(„l,)«----
We may therefore write P(E) = /i i g\H-i > when every expo-
nent is replaced by an index number (i. e., 8* replaced by S^)
and the expansion broken oflF at the term S**. The student must
of course constantly bear in mind the symbolic meaning of S^.
The second part of the problem is easily solved by the sym-
bolic method. Denoting this particular event by Er, we have
the following identity:
P{Er) - P(£r+l) = P{E,r,)
or
P{Er) - P{E^) = P{Er^l).
The following relations are self-evident:
P{Eo) = 1;
P{Ei) = P(Eo) - P(JE;J = 1 - 1^, also;
o or 02
P(Et) = PiEx) - PiEa;) = Y^Ts " (1 + S)* " (1 + 5)«*
The complete mduction gives us finally:
P(iE;,) =
a+sy
Assuming the rule is true for r, we may easily prove it is true
for r + 1 also. We have in fact:
(1 + sr (1 + s)^^ (1 + -8)'+^ •
, Google
Digitized by'
46 THE ADDITION AND MULTIPLICATION THEOREMS. [35
35. Example 20. TchebychefiPs Problem. — ^The following solu-
tion of a very interesting problem is due to the eminent Russian
mathematician^ TchebycheflF, one of the foremost of modern
analysts.
A proper fraction is chosen at random. What is the proba-
bility it is in its lowest terms?
Stated in a slightly different wording the same question may
also be put as follows: If A/B is a, proper fraction, what is the
probability that A and B are prime to each other?
If p2, Pz, Pb, '" Pm denote respectively the probabilities that
each of the primes 2, S, 5, * " m is not b, common factor of
numerator and denominator of A/B, then the probability that
no prime number is a common factor is:
P = P2 • ps • P6 • • • Pm • • • p« • • • ad. inf. (I)
This follows from the multiplication theorem and from the fact
that the sequence of prime numbers is infinite.
Tchebycheff now first finds the probability gm = 1 — Pm that
the fraction A/B does contain the prime m as factor of both A
and B. By dividing any integral number by the prime m we
obtain besides the quotient a certain remainder that must be
one of the following numbers, viz.:
0, 1, 2, 3, 4, ... (m - 1).
Each of the above remainders may be regarded as a possible
event. The probability to obtain as a remainder is accordingly
l/m. The probability that m is contained as a factor of -4 is
therefore l/m. This same quantity is also the probability that
m is a factor of B. The probability that both A and B are
divisible by m is therefore:
1 111 - 1
9m=l-pm = ~-- = ;;^, or p,= l-.^.
Hence we have for the various primes:
1 1 1 1 1 1
2^=1— 92* P8=l — Q2> P6=1""r5> ••••
22* r» ^ 32 > -Fft * 52*
Digitized by
Google
35] EXAMPLE 20. tchebtcheff's problem. 47
Formula (I) then takes the form:
Forming the reciprocal 1/P we get:
1
• ad. inf.
1
1
1 .
1
p~
1
1
1
1
22
i
32
1
52
Now each factor on the right hand side is the sum of a geometrical
progression, as:
p= \l+22"^(^"^ / (^■'■32"^ (32)2 ■• )*
(^ + ^"^(^2+ •••)••• ad. inf.
Multiplying out we shall have:
p=j2 + 22"*'32"*'^'''^'^ •••ad. inf.
The above infinite series is, however, merely the well known
Eulerian expression for 11^/6, hence:
Suppose furthermore we were assured that none of the three
primes 2, 3, 5 was a common factor of both A and B. What
would then be the probability that the fraction might be reduced
by division by one or more of the other primes?
Denoting by the symbol P(7) the probability that none of the
primes from 7 and upwards is a common factor, we get:
^a) = (1-72) (l-ip)(^" 132)' " ad-inf->
also:
^-^-('-å)('-å)('-å)^-
Digitized by
Google
48 THB ADDITION AND BCULTIFLICATION THEOREMS. [35
or:
P<,=3*[(l-^)(l-^)(l-^)]-0.950.
The probability of the divisibiKty of both numerator and
denominator of a fraction chosen at random by a prime larger
than 5 is thus:
20"
1-P(T)=^.
The summation of the infinite series of the reciprocals of the
squares of the natural numbers baffled for a long time the skill
of some of the most eminent mathematicians. Jacob Bernoulli,
the renowned classical writer on probabilities, proved its conver*
gency but failed to find its sum. The final simmiation was first
performed by Euler.
Digitized by
Google
CHAPTER V.
MATHEMATICAL EXPECTATION.
36. Definition, Mean Values. — ^It is common belief among
many people that gambling and all kinds of betting have their
source in reckless desire. This is often argued by moral reform-
ers, but cannot be said to be the true cause. Whenever by ordi-
nary gambling or by a bet, actual value is exposed to a complete or
partial loss, this exposure is not due to the fact that the gamester
is reckless, but because there is hope of an actual gain. " Hope,"
says Spinoza in his treatise on ethics, "is the indeterminate joy
caused by the conception of a future state of affairs of whose
outcome we are in doubt." Actual mathematical calculation
cannot be attempted on the basis of this definition any more
than it could be attempted to determine a mathematical prob-
ability from the definition of Aristotle. "We disregard there-
fore the psychological element of desire, which is associated with
hope or expectation as well as the anxiousness or dread associated
with the related psychological element of non-desire" (Cantor).
The so-called mathematical expectation is the product of an
expected gain in actual value and the mathematical probability
of obtaining such a gain. The danger of loss may in this case
be regarded as a negative gain. Thus if a person, A, may expect
the gain, G, from the event, E, whose probability of happening
•is equal to p, then ^ = p*6 is his mathematical expectation.
The quantity expressed by the symbol, e, is here the amount it
is safe to hazard for the expected gain, 6. We may also regard
the quantity, e, as a mean value or average value. Among a
large number of n cases only np will bring the gain, 6, the others
not. Thus the total gain is:
paG^n^pG.
Suppose we have n mutually exclusive events, EuEi, • • •, En,
5 49
Digitized by
Google
50 MATHEMATICAL EXPECTATION. [36
forming a complete disjunction. For their respective prob-
abilities we have then the following equation:
P1 + P2 + PZ+ |-Pn= 1.
If the actual occurrence of a certain one of these events, say,
JE., brings a gain of Oa, then the total value of the mathematical
expectation of the n events is:
Since Sp. = 1 this result may be written:
eXipi + P2+ |-Pn) = 6i'Pi+ G2'P2+ Gz'Pz+ '-On'Vn,
hence e may be regarded as the mean value of the diflFerent
quantities G^ with the weights p, (a = 1, 2, 3, • • •, n).
Although we shall discuss the theory of mean values in a
following chapter a few preliminary remarks might not be out
of place here.
A variable quantity X is related to a series of events Eu S,
Ez, '•', En (it being assumed that these events form a complete
disjunction) in such a manner that when E^ happens X takes on
the value x^ia == 1, 2, 3, • • •, n). If furthermore pi, P2, P3> • • •
denote the respective probabilities of the occurrence of Ei, E%,
Ez, • • • , then
M(X) = piXi + P2X2 + • • 'PnXn
is called the mean value or simply the mean of X.
The above definition may be illustrated by the following
concrete urn-scheme. An urn contains N balls of which ai balls
are marked xi, a^ balls marked xi *•* and finally On balls marked
Xn where 01 + 02 + 03+ •••On= N. Each drawing from the
urn produces a certain number JC, which may assume n different
values xu x^^ Xz, • • •, x^ each with the respective probabilities:
Oi 02 On
Pl = j^, P2 = ]^--- Pn = j^.
The arithmetic mean of all the numbers written on the balls is:
aixi + 023:^ + > > > a^Xn
N
which agrees with the mean as defined above.
Digitized by
Google
(
38] VARIOUS EXPLANATIONS OF THE PARADOX. 51
37. The Petrograd (St. Petersburg) Problem. — In this con-
nection it is worthy to note a celebrated problem, which on
account of its paradoxical nature has become a veritable stumb-
ling block, and has been discussed by some of the most eminent
writers on probabiUties. The problem was first suggested by
Daniel Bernoulli in a communication to the Petrograd — or as
it was then called St. Petersburger Academy — ^in 1738.
The Petrograd problem may shortly be stated as follows: Two
persons A and B are interested in a game of tossing a coin under
the following conditions. An ordinary coin is tossed until head
turns up, which is the deciding event. If head turns up the first
time A pays one dollar to B, if head appears first at the second
toss B is to receive two dollars, if first at the third time four
dollars and so on. What is the mathematical expectation of J5?
Or in other words, how much must B pay to A before the game
starts in order that the game may be considered fair?
The mathematical expectation of B in the first trial is
I X 1 = 2« The mathematical expectation for head in second
throw is ay X 2 = J. Or in general the mathematical prob-
ability that head appears for the first time in the nth toss is
(1)", and the co-ordinated expectation is 2^""^-h2^= J. Thus the
total expectation is expressed by the following series:
J + i + K'---
When n = 00 as its limiting value it thus appears that B
could afford to pay an infinite amount of money for his expected
gain.
38. Various Explanations of the Paradox. The Moral Expec-
tation. — ^This evidently paradoxical result has called forth a num-
ber of explanations of various forms by some eminent mathe-
maticians. One of the commentators was D'Alambert. It was
to be expected that the famous encyclopaedist, who — as we have
seen — did not view the theory of probabilities in too kindly a
manner, would not hesitate to attack. He returns repeatedly
to this problem in the "Opuscules" (1761) and in "Doutes et
questions" (Amsterdam, 1770).
D'Alambert distinguishes between two forms of possibilities,
viz., metaphysical and physical possibilities. An event is by
Digitized by
Google
52 MATHEBfATICAL EXPECTATION. [38
him called a metaphysical possibility, when it is not absurd.
When the event is not too "uncommon" in the ordinary course
of happenings it is a physical possibility. That head would
appear for the first time after 1,000 throws is metaphysically
possible but quite impossible physically. This contention is
rather bold. "What would," as Czuber remarks, "D'Alambert
have said to an actual reported case in 'Grunert's Archiv* where
in a game of whist each of the four players held 13 cards of one
suit." The numerical probability of such an event as expressed
by mathematical probabilities is (635013^59600)"^.
D'Alambert's definitions including the half metaphorical term
"ordinary course" are rather vague. And what numerical
value of the mathematical probability constitutes the physical
impossibility? D'Alambert gives three arbitrary solutions for
the probability of getting head in the nth throw, namely:
1 1 1
2^{1 + Pn^)' 2'^+*'*' ^ , 2^B
2» +
-ti.^«/3
{K^n)
where a, j8, B, K are constants and q an uneven number.
Daniel Bernoulli himself gives a solution wherein he introduces
the term "moral expectation." If a person possesses a sum of
money equal to x then according to Bernoulli
. kdx
is the moral expectation of x, k being a constant quantity.
Integrating we get:
J dy=kj ~ = Å;(log b — loga) = k log-,
which is the moral expectation of an increase 6 — a of an original
value a. If now x denotes the sum owned by B we may replace
the mathematical expectation by their corresponding moral ex-
pectations, that is to say replace 2"~V2" by (1/2") log (6^+2~""^)/a;)
and we then have:
i(^iiog-^ + iiog-^+--2;iiog-^j.
Digitized OyCjOOQlC
88] EXPLANATION OF PARADOX. 53
which is a convergefit series. In this connection, it may be
mentioned that the Betnonllian hypothesis has foimd quite an
extensive use in the modern theory of utility,
De Morgan in his splendid little treatise "On Probabilities"
takes the view that the solution as first given is by no means an
anomaly. He quotes an actual experiment in coin tossii\g by
Bufifon. Out of 2,048 trials 1,061 gave head at the first toss,
494 at the second, 232 at the third, 137 at the fourth, 56 at the
fifth, 29 at the sixth, 25 at the seventh, 8 at the eighth and 6 at
the ninth. Computing the various mathematical expectations,
we find that the maximum value is found in the 25 sets with head
in the seventh toss, which gives a gain of 25 X 64 = 1,600. The
most rare occurrence, the 6 sets of head in the ninth throw gives
a gain of 6 X 256 = 1,536, which is the next highest gain in all
the nine sets. De Morgan furthermore contends that if Buffoa
had tried a thousand times as many games, the results would
not only have given more, but rruyre 'per game, arguing "that a
larger net would have caught not only more fish but more varieties
of fish; and in two millions of sets, we might have seen cases in
which head did not appear till the twentieth throw." Further-
more, "the player might continue until he had realized not only
any given sum, but any given sum per game." Therefore
according to De Morgan the mathematical expectation of a
player in a single game must be infinite.
Digitized by
Google
CHAPTER VI.
PROBABILITY A POSTERIORI,
39. Bayes's Rule. A Posteriori Probabilities. — ^The problems
hitherto considered have all had certain points in common.
Before entering upon the calculations of the mathematical
probability of the happening of the event in question, we knew
beforehand a certain complex of causes which operated in the
'general domain of action. We also were able to separate this
general complex of productive causes into two distinctive minor
domains of complexes, of which one would bring forth the event,
E, while the other domain would act towards the production of
the opposite event, E. Furthermore we also were able to
measure the respective quantitative magnitudes of the two
domains, and then, by a simple algebraic operation, determine
the probability as a proper fraction. The addition and multi-
plication theorems did not introduce any new principles, but
only gave us a set of systematic rules which facilitated and
shortened the calculations of the relations between the different
Absolute probabilities. The above method of determination
of a mathematical probability is known as an a priori determina-
i;ion, and such probabilities are termed a priori probabilities.
The problems treated in the preceding chapters have, nearly
all, been related to different games of chance, or purely abstract
mathematical quantities. The inorganic natiu'e of this kind of
problems has made it possible for us to treat them in a relatively
simple manner. In many of the problems, which we shall con-
sider hereafter, organic elements enter as a dominant factor and
make the analysis much more complicated and diflBcult.
^ All social and biological investigations, which are of a much
larger benefit and practical value than the problems in games of
chance, lead often to a completely different category of probabil-
ity problems, which are known as " a posteriori probabilities."
In problems where organic life enters into the calculations, the
complex of j^roductive causes is so varied and manifold, that
64
Digitized by
Google
40] DISCOVERY AND HISTORY OF THE RULE. 55
our minds are not able to pigeonhole the different productive
causes, placing them in their proper domains of action. But we
know that such causes do exist and are the origin of the event.
If now, by a series of observations, we have noticed the actual
occurrence of the event, E (or the occurrence of the opposite
event E), the problem of the determination of an a posteriori
probability to find the probability that the event E originated
from a certain complex, say F. We must then, first of all,
form a complete hypothetical judgment of the form: E either
iappens from the complexes Fi, or F2, or Fz, " - or Fn» But we
must not forget that, in general, the different complexes F^
(a = 1, 2, •• •, n) of the disjunction are not known a priori.
We must, therefore, determine the respective probabilities for
the actual existence of such disjunctive complexes F.« These
probabilities of existence for the complexes of causes are in
general different for each member, a fact which often has been
overlooked by many investigators and writers on a posteriori
probabilities, and which has given rise to meaningless and
paradoxical results.
40. Discovery and History of the Rule. — ^The first discoverer
of the rule for the computation of a posteriori probabilities by
a purely deductive process was the English clergyman, T. Bayes.
Bayes's treatise was first published after the death of the author
by his friend. Dr. Price, in Philosophical Transactions for 1763.
The treatise by the English clergyman was, for a long time,
almost forgotten, even by the author's own countrymen; and
later English writers have lost sight of the true " Bayes's Rule '*
and substituted a false, or to be more accurate, a special case of
the exact rule, in the different algebraic texts, under the discus-
sion of the so called " inverse probabilities," a name which is due
probably to de Morgan, and which in itself is a great misnomer.
This point, presently, we shall discuss in detail.
The careless application of the exact rule has recently led to
a certain distrust of the whole theory of " a posteriori proba-
bilities." Scandinavian mathematicians were probably the first
to criticize the theory. In 1879, Mr. J. Bing, a Danish actuary,
took a very critical attitude towards the mathematical principles
underlying Bayes's Rule, in a scholarly article in the mathe-
Digitized by
Google
56 FROBABILITT A POSTERIOBI. [41
matical journal Tidsskrift for Matemaiik. Bing's article caused
a sharp, and often heated, discussion among the older and younger
Danish mathematicians at that time; but his views seem to have
gained the upper hand, and even so great an authority on the
whole subject as the late Dr. T. Thiele, in his well-known work,
"Theory of Observations'' (London, 1903), refers to Bing's
article as " a crushing proof of the fallacies underlying the
determination of a posteriori probabilities by a purely deductive
method." As recently as 1908, the Danish writer on philosophy.
Dr. Kroman, has taken up cudgels in defense of Bayes in a
contribution in the Transactions of the Royal Danish Academy
of Science, which has done much towards the removal of many
obscure and erroneous views of the older authors. Among
English writers. Professor Chrystal, in a lecture delivered before
the Actuarial Society of Edinburgh, has also given a sharp
criticism of the rule, although he does not go so deeply into the
real nature of the problem as either Bing or Kroman.
Despite Chrystal's advice to " bury the laws of inverse prob-
abilities decently out of sight, and not embalm them in text books
and examination papers " the old view still holds sway in recent
professional examination papers. It is therefore absolutely
necessary for the student preparing for professional examinations
to be acquainted with the theory. In the following paragraphs
we shall, therefore, give the mathematical theory of Bayes's
Rule with several examples illustrating its application to actual
problems, together with a criticism of the rule.
41. Bayes's Rule (Case I). — {The different complexes of causes
producing the observed event, E, possess different a priori proba^
bUities of existence.) Let E denote a certain state or condition,
which can appear under only one of the mutually exclusive
complexes of causes: jPi, F2, • • • and not otherwise. Let the
probability for the actual existence of jPi be ki and if Fi really
exists then let wi be the " productive probability " for bringing
forth the observed event, E {E being of a different nature from
F), which can only occur after the previous existence of one of
the mutually exclusive complexes, F. Let, in the same manner,
F2 have an " existence probability " of 1C2 and a " productive
probability " of W2, Fz an existence probability of kz and a pro-
Digitized by
Google
41] bayes's rule. 57
ductive probability of ws •• • etc. If now, by actual observa-
tion, we have noted that the event E has occurred exactly m
times in n trials, then the probability that the complex Fi Was
the origin of £ is:
Similariy that complex F2 was the origin:
<C2 • C02*"(l — 0)2)
Q2 =
Sic. • a>«*~(l - æj
and so on for the other complexes.
Proof. — ^Let the number of equally possible cases in the general
domain of action, which leads to one of the complexes Fa,^ be t
Furthermore, of these t cases let/i be favorable for the existence*
of complex Fi, fz for F2, fz for Fz, • • • , etc. Then the probabiUties
for the existence of the different complexes F^ (a = 1, 2, 3, • • • n)
are:
/i /z /s .. I
ici = ^ , '^2 = -7 , '^3 ~ T ■ * ' respectively.
Of the/i favorable cases for complex Fi, Xi are also favorable for
the occurrence of E.
Of the fi favorable cases for complex F2, X2 are also favorable for
the occurrence of E.
Of the/3 favorable cases for complex Fz, X3 are also favorable for
the occurrence of E.
The probability of the happening of E under the assumption that
Fi exists, i. e., the relative probability: P^{E), is:
or in general:
wi = 7-
/i
0). = -^ (a=l,2,3, ...).
The total number of equally likely cases for the simultaneous
occurrence of the event E with either one of the favorable cases
Digitized by
Google
58 PROBABILITT A POSTEBIOBI. [41
for Fu F2, Fz, ••• is: -^
Xi + X2 + X, + • • • = 2X..
The number of favorable cases for the simultaneous occurrence
of -Fi and E is Xi, for the simultaneous occurrence of F2 and JE,
X2, • • •, etc. Hence: we have as measures for their corresponding
probabilities
But
and
Hence
Xi = wi • /i, X2 = C02 • /2, • • •, etc.,
fi = ici ' t, f2 = K2 ' t, • • •, etc.
Xi = coi • ici • <, X2 = C02 • K2 • <, • • •, etc.
Substituting these values in the above expression for Qi, Q2, • • •
we get:
as the respective probabilities that the observed event originated
from the complexes Fi, F2, Fz, • • • . Such probabilities are called
a posteriori probabilities.
Let us now for a moment investigate the above expression for
Qu Q2, • • • . The numerator in the expression for Qi is ki • wi.
But Ki is simply the a priori probability for the existence of Fi
while 0)1 is the a priori productive probability of bringing forth
the event observed from complex -Fi. The product ki • wi is
simply the relative probability P^^{E), or the probability that
the event E originated from Fi. In the denominator we have
the expression S/c^cOa (a = 1, 2, • • • n) which is the total proba-
bility to get E from any of the complexes F^. From example 17
(Chapter IV) we know that the probability to get E exactly m
times from Fi in n total trials is:
Pi = (^) «ci • «r(l - wi)»-«
and the probability to get E from any one of the complexes, F,
Digitized by
Google
43] bates's rule. 59
m times out of n is:
Sp.= (^)Sic. .co.-(l-a)J«— (a= 1,2,3, .-.).
If, by actual observation, we know the event E to have happened
exactly m times out of n, then the a posteriori probability that
Fi was the origin is:
ft=-^7-f7 (a=l,2,3, ...). (I)
2(^)k. .^^a-coJ^-
The factorials 1 1 in numerator and denominator cancel each
(:)■'
other of course. It will be noticed that, in the above proof, it
I is not assiuned that the a posteriori probability is proportional
to the a priori probability, an assumption usually made in the
ordinary texts on algebra.
42. Bayes's Rule (Case U). — {Special Case. The a priori
probabilities of eodstence of the different complexes are equal.)
Sometimes the differed complexes F may be of such special
characters that their a priori probabilities of existence are equal,
i. e.,
ICl = IC2 = «C8 = ^^4 • • • l^n*
In this case the equation (I) simply reduces to:
Equation (I) gives, however, the most general expression for
Bayes's Rule which may be stated as follows:
If a definite observed event, E, can originate from a certain series
of mutually exclusive complexes, F, and if the actvxil occurrence of
the event has been observed, then the probability that it originated
from a specified complex or a specified group of complexes is also
the " a posteriori " probability or probability of existence of the
specified complex or group of complexes.
43. Determination of the Probabilities of Future Events
Based Upon Actual Observations. — It happens frequently that
Digitized by
Google
60 FROBABILITT A POSTERIOBI. [43
our knowledge of the general domain of action is so incomplete,
that we are not able to determine, a priori, the probability of the
occurrence of a certain expected event. As we already have
stated in the introduction to a posteriori probabilities, this is
nearly always the case with problems wherein organic life enters
as a determining factor or momentum. But the same state of
affairs may also occur in the category of problems relating to games
of chance, which we have hitherto considered. Suppose we had
an urn which was known to contain white and black balls only,
but the actual ratio in which the balls of the two different colors
were mixed, was unknown. With this knowledge beforehand,
we should not be able to determine the probability for the drawing
of a white ball. If, on the other hand, we knew, from actual
experience by repeated observations, the results of former draw-
ings from the same urn when the conditions in the general domain
of action remained unchanged during each separate drawing, then
these results might be used in the determination of the prob-
ability of a specified event by future drawings.
Our problem may be stated in its most general form as follows:
Let Fa denote a certain state or condition in the general domain
of action, which state or condition can appear only in one or the
other of the mutually exclusive forms: Fi, F2, Fz, • • •, and not
otherwise. Let the probability of existence of Fi, F^yE^' • • be
ici, «2, «8, • • • respectively, and when one of the comn^es Fi, -F2,
Fz, • • • exists (occurs) let wi, a?2, (^zy • • • be the j^pective pro-
ductive probabilities of bringing forth a . speomed event, E.
If now, by actual observation, we know the i^ent, JE, to have
happened exactly m times out of n total triads (the conditions in
the general domain of action being the same at each individual
trial), what is then the probability that the ^vent, E, will happen
in the (n + l)th trial also? ;
By Bayes's Rule we determined the " a posteriori " probabili-
ties or the probabilities of existence of the complexes -Fi, ^2, • • •
as:
^ _ Ki ' a?r(l — a?i)**~" n — ^^2 • «2"*(1 — 0)2)""^
Sk. • ^."•(l - 0)^^-^' ^^ S/c. • w.~(l - æ^y
(a= 1,2,3, •-.).
In the (n + l)th trial E may happen from any one of the mutually
Digitized by
Google
44] APPLICATION OF BAYES's RULE. 61
^exclusive complexes: F\, F2, ^8> • • • whose respective probabilities
in producing the event, E, are «i, a?2, «8, • • • . The addition
theorem then gives us as the total probability of the occurrence
of -B in the (n + l)th trial:
«. = SP^.rø = Ql • 0)1 + Q2 • 0)2 + Qs • 0),
S/c, ' æ^^a - ctfj"-" - 0), (HI)
+ 2.. . «.-(! - a,J— (a= 1,2,3,.-.).
If the a priori probabilities of existence are of equal magnitude
(Case II) the factors k in the above expression cancel each other
in numerator and denominator and we have
^ Sw/^(1 — wj**^" ^
41. Examples on the Application of Bayes's Rule. — Example
21. — Anj^n contains two balls, white or black or both kinds.
What is the probability of getting a white ball in the first draw-
ing, and if this event has happened and the ball replaced, what
is then the probability to get white in the following drawing?
Three conditions are here 'possible in the urn. There may be 0,
1, or 2 white balls. Each hypothetical condition has a proba-
bility of existence equal to J, and the productive probabilities
for white are 0, ^ and 1 respectively. The total probability to
get white is therefore:
If we now draw a white ball then the probabilities that it came
from the complexes: Fi, F2, Fz, respectively, are:
O.^i i^i i^i
These are also new existence probabilities of the three proba-
bilities. The probability for white in second drawing is therefore .
(0■^i)0+(i-^é)^^-(i^i)l = f
This solution of the problem is, however, not a unique solution,
because it is an arbitrary solution. It is arbitrary in this respect,
that we have without further consideration given all three com-
plexes the same probability of existence, J. We shall discuss
Digitized by
Google
62 PROBABILITY A POSTERIORI. [45
this part of the question under the chapter on the criticism of
Bayes's Rule.
Example ^2. — ^An urn contains five balls of which a part is
known to be white and the rest black. A ball is drawn four
times in succession and replaced after each drawing. By three
of such drawings a white ball was obtained and by one drawing
a black ball. What is the probability that we will get a white
ball in the fifth drawing?
In regard to the contents of the urn the following four hypoth-
eses are possible:
Fi: 4 white, 1 black balls,
fa: 3 " 2 "
Fz: 2 " 3 "
F^i 1 " 4 " «
Since we do not know anything about the ratio of distribution
of the different colored balls, we may by a direct application of
the principle of insuflBcient reason regard the four complexes as
equally probable, or:
ICl = IC2 = IC3 = 1^4 = 4.
If either Fi, F^, Fz or F\ exists, the respective productive
probabilities are:
COl == f , «2 = f , 0>Z = I, W4 = \.
By a direct substitution in the formula:
(a = 1, 2, 3, 4) f or n = 4 and m — 3 we get:
p ^ {m\w + im\w + (f)M)(-i) + {\m)ik) __ ,,
^ " am) + (mi) + am) + im^) " ^**
45. Criticism of Hayes's Rule. — In most English treatises on
the theory of chance the " a posteriori " determination of a
mathematical probability is discussed under the socalled "in-
verse probabilities." This somewhat misleading name was prob-
ably first introduced by the eminent English mathematician and
actuary, Augustus de Morgan. In the opening of tKe discussion
Digitized by VjOOQ IC - ^
45] CRITICISM OF BAYES'S RULE. 63
of a posteriori probabilities in the third chapter of his treatise,
" An Essay on Probabilities," de Morgan says: " In the preceding
chapter, we have calculated the chances of an event, knowing the
circumstances under which it is to happen or fail. We are now
to place ourselves in an inverted position, we know the event,
and ask what is the probabiUty which results from the event
in favor of any set of circumstances under which the same might
have happened." Is this now an inverse process? By the a
priori or — as de Morgan prefers to call them, — ^the direct prob-
abilities, we started from a definitely known condition and de-
termined the probability for a future event, E, or what is the same,
the probability of a specified future state of affairs. Here we
start knowing the present condition and try to determine a past
condition. The process apparently appears to be the inverse of
the former, although they both are the same. We possess a
definite knowledge of a certain condition and try to determine
the probability of the existence of a specified state of affairs, in
general different from the first condition, but whether this state
of affairs occurred in the past or is to occur in the future has no
bearing on our problem. In other words, time does not enter
as a determining factor. And even if we were willing to admit
the two processes of the determination of the different probabil-
ities to be inverse, the probabilities themselves can not be said
to be inverse. Nevertheless, this misleading name appears over
and over again in examination papers in England and in America
as a thoroughly embalmed corpse which ought to have been
buried long ago. What is really needed, is a change of customary
nomenclature in the whole theory of probability. Instead of
direct and inverse, a priori and a posteriori probabilities, it would
be more proper to speak about " prospective " and " retro-
spective " probabilities in the application of Bayes's Rule. All
probabilities are in reaUty determined by an empirical process.
That there is a certain probabiUty to throw a six with a die we
only know after we have formed a definite conception of a die.
The only probabilities which we perhaps rightly may name a
priori are the arbitrary probabilities in purely mathematical
problems where we assume an ideal state of affairs. "There
is," to quote the Danish writer on logic. Dr. Kroman, " really
Digitized by
Google
64 PROBABILITY A POSTEBIORI. [4R
more reason to doubt the a priori than the a posteriori probabil-
ities, and it would be more natural and also more exact in the
application of Bayes's Rule to speak about the actual or original
and the new or gained probability."
The discussion above* has really no direct bearing on Bayes's
Rule but was introduced in order to give the student a clearer
understanding of the main principles underlying the whole deter-
mination of a posteriori probabilities by means of actual experi-
mental observations, and also to remove some obscure points.
From his ordinary mathematical training every student of mathe-
matics has an almost intuitive understanding of an inverse process.
Naturally when he encounters again and again the customary
heading: " inverse probabilities " in text-books he obtains from
the very start — ^almost before he starts to read this particular
chapter — an inverse idea of the subject instead of the idea he really
ought to have. Nowhere in continental texts on the theory of
probabilities, will the reader be able to find the words direct and
inverse applied in the same sense as in English texts since the
introduction of these terms by de Morgan. We shall advise
readers who have become accustomed to the old terms to pay
no serious attention to them.
46. Theory Versus Practice. — In §41 we reduced Bayes's
Rule to its most general form:
^ = S«..co.»(l -«.)»-» («=1'2,3, ...).
This is an exact expression for the rule, but it is at the same
time almost impossible to employ it in practice. Only in a few
exceptional cases do we know, a priori, the different values of the
often numerous probabilities of existence k^, of the complexes
F^, and in order to apply the rule with exact results we require
here suflBcient facts and information about the different com-
plexes of causes from which the observed event, E, originated.
Bayes deduced the rule from special examples resulting from
drawings of balls of different colors from an urn where the different
complexes of causes were materially existent. The probability
of a cause or a certain complex of causes did not here mean the
probability of existence of such a complex but the probability
Digitized by
Google
46] THEORY VERSUS PRACTICE. 65
that the observed event originated from this particular complex.
In order to elucidate this statement we give following simfljie
example:
Example 23. — ^A bag contains 4 coins, of which one is coined
with two heads, the other three having 'both head and tail. A
coin is drawn at random and tossed four times in succession and
each time head turns up. What is the probability that it was
the coin with two heads?
The two complexes Fi and F2, which may produce the event,
Ey are: -Fi, the coin with two heads, and F2, an ordinary coin.
The probability of existence of Fi is the probability of drawing
the single coin with two heads which is equal to J, the probability
of existence for the other complex, F2, is equal to f . The
respective productive probabilities are 1 and J. Thus k\ = J,
1C2 = 1, «i = 1 and a?2 = i- Substituting these values in formula
(I) (n = 4, m = 4), we get:
Q = (i X 1*) -*■ (i X 1* + 1 X iW) = i -5- H = H-
But in most cases we do not know anything about the material
existence of the complexes of causes from which the event, £,
originated. On the contrary, we are forced to form a hypothesis
about their actual existence. To start with a simple case we take
example 21 of § 44.
We assumed here three equally possible conditions in the urn
before the drawings, namely the presence of 0, 1, or 2 white balls.
From this assumption we found the probability to get a white
ball in the second drawing, after we had previously drawn a white
ball and then put it back in the urn before the second drawing,
to be equal to |. As we already remarked, this solution is not
imique because it is an arbitrary solution. It is arbitrary to
assign, without any consideration whatsoever, \ as the probability
of existence to each of the three conditions. Let us suppose
that each of the two balls bore the numbers 1 and 2 respectively.
We may then form the following equally likely conditions:
6162, h\W2, 62WI, WlW^y
each condition having an a priori probability of existence equal
to \ and a productive probability for the drawing of a white
6
1^
Digitized by
Google
66 probabhitt a posteriori. [46
ball equal to: 0, i, i and 1 respectively. Thus:
ICl = IC2 = IC3 = «C4 = i
and
«1 = 0, <«)2 = 2> W8 = h ^4 = 1.
The respective a posteriori probabilities, that is the new or
gained probabilities of the four hypothetical conditions, become
now by the application of Bayes's Rule (Formula II) :
Hence the probability for white in the second drawing is:
Formula IV: R = ^ In Nn-m I
' fi = -^ 2 + (i -^ 2) + (i ^ 2) + (1 ^ 2) = i.
In the first solution we got | for the same probability. Which
answer is now the true one? Neither one! The true answer to
the problem is that it is not given in such a form that the last
question — ^the probability of getting a white ball in the second
drawing — may be settled without any doubt. The answer must
be conditional. Following the first hypothesis we got §, while
the second hypothesis gives f as the answer.
We next proceed to example 22 which is almost identical in
form to the first one, the only difference being a greater variety
of hypothetical conditions. We started here with the following
four hypotheses:
Fi: 4 white, 1 black ball, F2: 3 white, 2 black, Fzi 2 white, 3
black and F4: 1 white and 4 black balls, assigning J as the hy-
pothetical existence probability.
By marking the 5 balls similariy as in the last example, with
the numbers from 1 to 5 we may form the complexes:
Fii 4 white and 1 black ball in (5) ways,
F2: 3 " " 2 " balls " (i) "
Fa: 2 " " 3 " " " (§) "
2^4=1 " " 4 " " " (5) "
This gives us a total of 5 + 10 + 10 + 5 = 30 different
complexes. Assuming all of these complexes equally likely
Digitized by
Google
47] PKOBABILITIES FXPRESSED BY INTEGRALS. 67
to occur, we get following probabilities of existence and pro-
ductive probabilities:
jci = /C2 = -Ks = K4 = • • • = Kao = ^
wi = W2 = W3 = W4 = W6 = f (Productive prob. for Fi)
W6 = C07 = COS = • • • = «i6 = f (Productive prob. for F2)
W16 = W17 = • • • = W26 = f (Productive prob. for Fz)
W26 = W27 = W28 = CO29 = W30 = i (Productive prob. for ^4)-
The total probability of getting a white ball in the second
drawing is now ^^Ir^I^f^ (« = 1, 2, 3, • • •, 30).
Actual substitution of the above values of w in this formula
gives us the final result as: R = ^^.
. 47. Probabilities Expressed by Integrals. — By making an ex-
tended use of the infinitesimal calculus Mr. Bing and Dr. Kroman
in their memoirs arrived at much more ambiguous results through
an application of the rule of Bayes. Starting with the funda-
mental rule as given in equation (I) in § 41, we may at times en-
counter somewhat simpler conditions inside the domain of
causes. The total complex of actions may embrace a large
number of smaller sub-complexes construed in such a way that
the change from one complex to another may be regarded as a
continuous process, so that the productive probabilities are
increased by an infinitely small quantity from a certain lower
limit, a, to an upper limit, 6. Denoting such continuously in-
creasing probabilities by v and the corresponding small proba-
bilities of existence by vdv, we have as the total probability of
obtaining E from any one of the minor complexes with a pro-
ductive probability between a and j8 (a S a, j8 S 6)
./a
uvdv.
The probability that when E has happened it originated from
one of those minor complexes, or the probability of existence o£'
some one of those complexes is:
I uvdv
P =
>6
uvdv
I
Google
Digitized by'
68
FBOBABILITr A FOSTEBIOBI.
147
The situation may be still more simplified by the following con-
siderations. In the continuous total complex between the limits
a and h we have altogether situated (6 — a)ldv individual minor
complexes. If we assunie all of these complexes to possess the
same probability of existence, we must have:
vdv =
dv
b — a'
The two formulas then take on the form:
vdv
and
1_ r^
'b-aX '
I vdv
A still more specialized form is obtained by letting a = and
A = 1 which gives:
vdv and P = -h .
J vdv
The above formulas may perhaps be made more intelligible
to the reader by a geometrical illustration.
Let the various productive probabilities, v, be plotted along the
X axis in a Cartesian coordinate system in the interval from a
to b (a < 6). To any one of these probabilities say Vr there
corresponds a certain probability of existence, Vr, represented
by a y ordinate. In the same manner the next following pro-
ductive probability, i?r+i, will have a probability of existence
Digitized by
Google
47] PROBABILITIES EXPRESSED BY INTEGRALS. 69
represented by an ordinate Vr+i. It is now possible to represent
the various u's by means of areas instead of line ordinates. Thus
the probability of existence, Vr, is in the figure represented by
the small shaded rectangle, with a base equal to
IV+l — Vr = AVr,
and an altitude of Ur, the total area being equal to AvrVr* That
this is so, follows from the well-known elementary theorem from
geometry that areas of rectangles with equal bases are directly
proportional to their altitudes. The sum of the different u's is
thus in the figure represented as the sum areas of the various
small rectangles in the staircase shaped histograph. Now ac-
cording to our assumption i? is a continuous function in the interval
from a to 6. We may, therefore, divide this interval, b — a,
into n smaller equal intervals. Let
b — a
tV+i — i?r = AVr =
n
be one of these smaller divisions. By choosing n sufficiently
large, (6 — a)/n or Av becomes a very small quantity and by
letting n approach infinity as a limiting value we have
Km u = lim vAv = itdv.
In this case the histograph is replaced by a continuous curve and
vdv is the probability of existence that the productive probability
is enclosed between v and v + dv.^
The probability to get E from any one of the complexes is
evidently given by the total area of the small rectangles, or in
the continuous case by means of the integral:
r
uvdv.
^ A more rigorous analyns would be as follows: We plot along the abscissa
axis intervals of the length e so that the middle of the interval has a distance
from the origin equal to an integral multiple of e. If now e is chosen suffi-
ciently small, we may regard the probability of existence of th for values of
the variable v between re — i€ and re + i« as a constant and the probability
that V falls between the limits re — }€ and re + i< may hence be expressed aa
ciir. When € approaches as a limiting value this expression becomes udv.
See the similar discussion under frequency curves.
Digitized by
Google
70 FBOBABILITY A POSTEBIOBI. [48
In the same way the probability that E originated from any
of the complexes between a and j8 is:
uvdv
r
X
ft
uvdv
The special case a = and 6=1 needs no further commentary.
We are now in a position to consider the examples of Bing and
Kroman. Any student familiar with multiple integration will
find no diflSculty in the following analysis. For the benefit of
readers to whom the evaluation of the various integrals may seem
somewhat difficult, we may refer to the addenda at the close of
this treatise or to any standard treatise on the calculus as, for
instance, Williamson's " Integral Calculus."
48. Example 24. — ^An urn contains a very large number of
similarly shaped balls. In 10 successive drawings (with replace-
ments) we have obtained 7 with the number 1, 2 with the number
2, and one having the number 3. What is the probability to
obtain a ball with another number in the following drawing?
We must here distinguish between 4 kinds of balls, namely
balls marked 1, 2, 3, or "other balls." A general scheme of
distribution of the balls in the urn may be given through the
following scheme:
nx balls marked with the number 1,
ny " " " " " 2,
rø " " " " " 3 and
rU = n(l — X — y — z) other balls.
Here x, y, z and t represent the respective productive probabil-
ities. If we now let all such probabilities assume all possible
values between and 1 with intervals of 1/n, we obtain the pos-,
sible conditions in the total complex of actions. Each of these
conditions has a probability of existence, s, and the productive
probabilities x, y, z, and 1 — a: — y — 2. The original probability
for 7 ones, 2 twos and 1 three in 10 drawings is:
Digitized by VjOOQ IC
48] EXAMPLE 24. 71
Now when n is a very large number the interval 1/n becomes a
very small quantity, and we may approximately write:
8 =
and also write the above sum as a triple integral:
10!
I I I u * x' • y^ ' z ' dx ' dy • dz,
i/o Jq Jo "^
^"712111^0^0^0
where
p = 1 — a? and g = 1 — a: — y.
If now the above event has happened, then the probability to get
a different marked ball in the 11th drawing is:
/•I /v nq
I I I u ' x' ' y^ ' z{l — X — y — z) ' dx ' dy ' dz
Q __ t/0 t/0 Jo
V — /•i /•p nq
I I I u ' x' ' y^ ' z ' dx - dy dz
Jo Jo Jo
It is, however, quite impossible to evaluate the above integral
without knowing the form of the function u; but unfortunately
our information at hand tells us absolutely nothing in regard to
this. Perhaps the balls bear the numbers 1, 2 and 3 only, or
perhaps there is an equal distribution up to 10,000 or any other
number. Our information is really so insuflScient that it is quite
hopeless to attempt a calculation of the a posteriori probability.
Many adherents of the inverse probability method venture,
however, boldly forth with the following solution based upon the
perfectly arbitrary hypothesis that all the u's are of equal magni-
tude. This gives the special integral:
/»I /v /»«
I I I a;^ • J/^ • 2(1 — a: — y — 2), Ær • <iy • &
Q __ t/Q Jo Jo
/•I /n» /•«
I I I x' ' y^ • z ' dx ' dy ' dz
Jo Jo Jo
where once more it must be remembered that
x + y + z^l.
In this case the limits of x are and 1, those of y are and 1 — - a;
and those of 2 are and I — x — y.
This is a well-known form of the triple integral which may be
evaluated by means of Dirichlet's Theorem:
digitized by
Google
72 FBOBABILITY A P08TEBI0RI. [49
T(b)T(m)T(n)
r{l + b + m+n)'
/•l /•l—m /•!— »-ir
Jo t/o t/o
(See Williamson's Calculus.)
Remembering the well-known relation between gamma func-
tions and factorials, viz. T(n + 1) = nl, we find by a mere
substitution in the integral, the value of the probability in
question to be 1:14. Another and equally plausible result is
obtained by a slightly different wording of the problem.
Ten successive drawings have resulted in balls marked 1, 2,
or 3. What is the probability to obtain a ball not bearing such
a number in the 11th drawing? This probability is given by
the formula.
••1
ri^\l - v)dv
-71 =1:12.
Jo
X'
Quite a different result from the one given above.
49. Example 25 — ^Bing's Paradox. — ^A still more astonishing
paradox is produced by Bing when he gives an example of Bayes's
Rule to a problem from mortality statistics. A mortality table
gives the ratio of the number of persons living during a certain
period, to the number living at the beginning of this period,
all persons being of the same age. By recording the deaths
during the specified period (say one year) it has been ascertained
that of 8 persons, say forty years of age at the beginning of the
period, m have died during the period. The observed ratio is
then (s — m)/s. If 9 is a very large number this ratio may (as
we shall have occasion to prove at a later stage) be taken as an
approximation of the true ratio of probability of survival during
the period. If 8 is not sufficiently large the believers in the inverse
theory ought to be able to evaluate this ratio by an application
of Bayes's Rule, by means of an analysis similar to the one as
follows:
Let y be the general symbol for the probability of a forty-
year-old person being alive one year from hence. Each of such
persons will in general be subject to different conditions, and the
general symbol, y, will therefore have to be understood as the
Digitized by
Google
49] EXAMPLE 25. bing's fabadox. 73
symbol for all the possible productive probability values changing
from to 1 by a continuous process.
Assuming s a very large number each condition will have a
probability of existence equal to vdy. We may now ask: What
is the probability that the rate of survival of a group of s persons
aged 40 is situated between the limits a and j8?
The answer according to Bayes's Rule is:
X
X
I . (I)
Let us furthermore divide the whole year into two equal parts
and let yi be the probability of surviving the first half year,
3^2 the probability of surviving the second half, and Ui • dyu
W2 • dy2 the corresponding probabilities of existence. Then the
respective a posteriori probabilities for yi and y^ are:
J yi*~^Kl — yiT^uidyx
and
y2*~^(l — y2)'^U2dy2
X
1
^2*^ (1 — y^T^dy^
(mi + 1712 = ^)
(mi and iri^ represent the number of deaths in the respective half
years.) The probability that both y\ and y^ are true is then
according to the multiplication theorem:
yi""!(l — y\r^uidyiy2*^{l — y^T^dy^,
J I yi*~^Kl — yiT^idyi I ^2*^(1 — y^f^idy^
t/o
where y = yi - 3^2.
The probability that the probability of survival for a full
year, y, is situated between the limits a and j8 is therefore:
I I yi"^K^ - yi)'"V2*~"(l - y2)'^i • U2 • dyi • dy2
^^-4 7.1 (II)
I yi'^Kl - yiT^uidyi 1 y2*~^(l - y^T^dy
Digitized by VjOOQ IC
74 PBOBABILITY A POSTERIORI. [49
where the limits in the double integral in the numerator are de-
termined by the relation:
Choosing the principle of insufficient reason as the basis of
our calculations, merely assuming that all possible events are, in
the absence of any grounds for inference, equally likely, the
various quantities expressed by the general symbol, w, become
equal and constant and cancel each other in numerator and
denominator, which brings the a posteriori probabilities ex-
pressed by (I) and (II) to the forms:
X
X
1
(HI)
and
XX
yi-*'(l - yi)"'y2-"'—(l - y^Y^dyi ■ dyt
t/0 •/O
(IV)
where the limits in the numerator in the latter expression are
determined by the relation : a < yiy^ < jS.
Letting
y
and then
1 — yi = 2(1 — y)
this latter expression may after a simple substitution be brought
to the form:
J yi"^(l - yd'^dyij ^2*^(1 - y2)'^dyt
(See appendix.)
Mr. Bing now puts the further question : What is the probability
that a new person forty years of age, entering the original large
Digitized by
Google
49] EXAMPLE 25. bing's paradox. 75
group of s persons, will survive one year, when we assume
mi = 7?i2 = 0? (Ill) gives the answer:
X'
y^'dy ^_^j
X
y'dy
Formula (V), on the other hand, gives us:
r .J r .^ \^+2/
I yi'dyi I y2%2
As the above analysis is perfectly general, we might equally
well have applied it to each of the semi-annual periods, which
would give us an a posteriori probability of survival equal to
, ty l for each half year, or a compound probability of
, n ) for the whole year. Extending this process it is
easily seen that by dividing the year into parts, we shall have
, ey I as the final probability a posteriori that a forty-year-
old person will reach the age of forty-one. By letting n increase
indefinitely the above quantity approaches as its limiting
value and we obtain thus the paradox of Bing:
//, among a large group of s equally old persons, we have observed
no deaihs during a full calendar year then another person of the
same age outside the group is sure to die inside the calendar year.
This is evidently a very strange result, and yet, working on
the basis of the principle of insufficient reason, the mathematical
deductions and formula exhibit no errors.
Mr. Bing disposes of the whole matter by simply denying the
validity and existence of a posteriori probabilities. Dr. Kroman
on the other hand defends Bayes's Rule. "Mathematics,'*
Kroman says, "is — as Huxley has justly remarked — ^an ex-
ceedingly fine mill stone, but one must not expect to get wheat
flour after having put oats in the quern." According to the
Digitized by
Google
76 PROBABILITY A FOSTEBIORI. [50
Danish scholar the paradox is due to the use of a wrong formula.
We ought to have used the general formula (II) instead of formula
(V) which is a special case. In the general formula we encounter
the functions u, denoting the probability existence of the various
productive probabilities y. As we do not know anything about
this function u it is hopeless to attempt a calculation. This
brings the criticism down to the fundamental question whether
we shall build the theory of probabilities on the principle of
" cogent reason " or the principle of " insufficient reason.''
SO. Conclusion' — Contradictory results of a similar kind to
#the ones given above have led several eminent mathematicians
to a complete denunciation of the laws underlying a posteriori
probabilities. Professor Chrystal, especially, becomes extremely
severe in his criticism in the previously mentioned address before
the Actuarial Society of Edinburgh. He advises "practical
people like the actuaries, much though they may justly respect
Laplace, not to air his weaknesses in their annual examinations.
The indiscretions of a great man should be quietly allowed to be
forgotten." Although one may heartily agree with Professor
Chrystal's candid attack on the belief in authority, too often
prevailing among mathematical students, I think — aside from
the fact that the rule was originally given by Bayes — ^that the
great French savant has been accused unjustly as the following
remarks perhaps may tend to show.
In our statement of Bayes's Rule, we followed an exact mathe-
matical method, and the final formula (I) is theoretically as
correct as any previously demonstrated in this work. The
customary definition of a mathematical probability as the
ratio of equally favorable to coordinated possible cases, is not
done away with in this new kind of probabilities; the former are
found in the numerator and the latter in the denominator; and
if we take care that each of the particular formulas, with its
definite requirements, is applied to its particular case, we do not
go beyond pure mathematics or logic. But are we able to get
complete and exact information about these requirements? In
the example of the tossing of a coin with two heads, this informa-
tion was at hand. Here we were able to enumerate exactly the
different mutually exclusive causes from which the observed
Digitized by
Google
60] CONCLUSION. 77
event originated. We were also able to determine the exact
quantitative measures for the probabilities, k, that these com-
plexes existed as well as the diflferent productive probabilities, co.
Here the most rigid requirements could be satisfied, and the rule
gave therefore a true answer.
In the other examples we encountered a diflferent state of
aflfairs. Here we were not able to enumerate directly the dif-
ferent complexes of causes from which the event originated, but
were forced to form diflTerent and arbitrary hypotheses about the
complexes of origin, F, and each hypothesis g^ve, in general, a
diflTerent result. Furthermore, we assumed a priori that the
diflferent probabilities of the actual existence of the complexes
were all equal in magnitude, and it was, therefore, the special
formula (II) we employed in the determination of the a posteriori
probabilities. In this formula, the diflferent k's do not enter at
all as a determining factor; only the productive probabilities, co,
are considered. The assumption that all the k's are equal in
magnitude is based upon the principle of insuflScient reason, or
as Boole calls it, " the equal distribution of ignorance."
The principle of equal distribution of ignorance makes in the
case of continuously varying productive probabilities, v, the
function, u, of the probabilities of existence of the various
complexes equal to a constant quantity. In other words, the
curve in Fig. 1, is replaced by a straight line of the form, u = k.
Now, as a matter of fact, we possess in most cases, some partial
knowledge of the complexes of action producing the event in
question. This partial knowledge — although far from complete
enough to make a rigorous use of formula (I) — ^is nevertheless
suflGicient to justify us in discarding completely any general
hypothesis assuming such simple conditions as above. Such
partial knowledge is, for instance, found in the Paradox of Bing.
Here the rather absurd hypothesis was made that the possible
values of the probability of surviving a certain period were
equally probable. In other words, it is equally probable that
there will die 0, 1, 2, • • •, or * persons in the particular period.
" Common sense, however, tells us that it is far more probable
that, for instance, 90 per cent, of a large number of forty-year-old
persons will survive the period than no one or every one will die
Digitized by
Google
78 PROBABILITY A POSTERIORI. [50
in the same period " (Kroman). The indiscreet use of formula
(II) therefore naturally leads to paradoxical results. On the
other hand, the fallacy of the happy-go-lucky computers, em-
ploying the special case (II) of Bayes's Rule, as well as the critics
of Laplace, lies in their failure to make a proper distinction
between " equal distribution of ignorance '' and " partial cogent
reason," which latter expression properly may be termed ^- an
unequal distribution of ignorance." If, despite the actual
presence of such unequal distribution of ignorance, we still insist
in using the special formula (II), which is only to be used in the
case of an equal distribution of ignorance, it is no wonder we
encounter ambiguous answers. Not the rule itself, its discoverer,
or Laplace, but the indiscreet computer is the one to blame.
Messrs. Bing, Venn and Chrystal, in their various criticisms, have
filled the quern with some rather " wild oats " and expected to
get wheat flour; and that one of those critics in his disappoint-
ment in not getting the expected flour should blame Laplace, is
hardly just.
So much for the principle of "equal distribution of igno-
rance." It may be of interest to see how matters turn out when
we Uke von Kries insist upon the principle of " cogent reason "
as the true basis of our computations. The reader will quite
readily see that a rigorous application of the Rule of Bayes in its
most general form as given by formula (I) really tacitly assumes
this very principle. In formula (I), we require not alone an
exact enumeration of the various complexes from -which the
observed event may originate, but also an exact and complete
information about the structure of such complexes in order to
evaluate their various probabilities of existence. If such informa-
tion is present, we can meet even the most stringent requirements
of the general formula, and we will get a correct answer. But
in the vast majority of cases, not to say all cases, such information
is not at hand, and any attempt to make a computation by means
of Bayes's Rule must be regarded as hopeless. We may, how-
ever, again remark that very seldom we are in complete ignorance
of the conditions of the complexes, which is the same thing as
saying that we are not in a position to employ the principle of
equal distribution of ignorance in a rigorous manner. From
Digitized by
Google
50] CONCLUSION, 79
other experiments on the same kind of event, or from other
sources, we may have attained some partial information, even if
insuflScient to employ the principle of cogent reason. Is such
information now to be completely ignored in an attempt to give
a reasonable, although approximate answer? It is but natural
that the mathematician should attempt to obtain as much of
such information as possible and use it in the evaluation of the
various probabilities of existence. Thus for instance, if, in the
Paradox of Bing, we had observed that the probability of survival
for a forty-year-old person never had been below .75 and never
above .95, it would be but reasonable to substitute those Kmits
in their proper integrals in order to attain an approximate answer.
To illustrate this somewhat subjective determination of an a
posteriori probability, we take another example from the memoirs
of Bing and Kroman.
Example {24), — ^A merchant receives a cargo of 100,000 pieces
of fruit. If every single fruit is untainted, the value of the cargo
may be put at 10,000 Kroner. On the other hand, any part of
the cargo more or less tainted is considered worthless. The
merchant has never before received a similar cargo and does not
know how the fruit has been affected by travel. As samples, he
has selected 30 pieces picked at random from the cargo and all
samples proved to be fresh. He asks a mathematician what
value he can put on the cargo.
If the mathematician uses the special formula (II), assum-
ing an equal distribution of ignorance, therefore assimiing that
it is equally probable that for example none, 5,000 or all the
individual pieces of fruit were untainted, the answer is:
10,000 ^^ii = 9687.5 Kroner.
If we use the true rule, the a posteriori probability of the whole-
someness of the cargo is given by the integral:
'»I
X
X
1
wf^dv
Digitized by
Google
Si) PROBABILITY A POSTEBIORI. [50
where v is the general expression for a possible probability of
wholesomeness between and 1 and vdv the corresponding proba-
bility of existence. Now if the mathematician has no complete
information as to this particular function, u, it would be foolish
of him to attempt a calculation, since the hypothesis of an equal
probability of existence for all possible values of v evidently
gives an arbitrary and perhaps a very erroneous result. On
the other hand, the computer may possibly have access to some
partial information. Perhaps the merchant has received fruit
of a similar kind or heard about cargoes of this particular kind
of fruit received by other dealers. If now the merchant were
able to inform the computer that in a great number of similar
cases the probability of wholesomeness had been between 0.9
and 1 with an approximately even distribution, while it never
had been below 0.9, then nothing would hinder the mathematician
to present the following computation:
t/O.9
dv
- = 0.9726
i^dv
and tell the merchant that on the basis of the information given
9,726 Kroner would be a fair price for the cargo.
This is really the point of view taken by the English mathe-
matician. Professor Karl Pearson, one of the ablest writers on
mathematical statistics of the present time, when he says: "I
start, as most writers on mathematics have done, with *the
equal distribution of ignorance ' or I assume the truth of Bayes's
Theorem. I hold this theorem not as rigidly demonstrated, but
I think with Edgeworth that the hypothesis of the equal dis-
tribution of ignorance is, within the limits of practical life,
justified by our experience of statistical ratios, which are unknown,
i. e., such ratios do not tend to cluster markedly round any
particular point.''
To sum up the above remarks: Theoretically Bayes's Rule is
true. If we are able to enumerate and determine the probabilities
of existence of the complexes of origin it will also give true
results in practice. If we are justified in assuming the principle
Digitized by
Google
I
'\
50] CONCLUSION. 81
of " insufficient reason " or " equal distribution of ignorance *'
as the basis for our calculations, formula (II) may be employed
with exact results after a rigid enumeration of the complexes.
If the principle of " cogent reason " is required as the basis, an
exact computation is in general hopeless, and we can only after
having obtained partial subjective information give an approxi-
mate answer.
With these remarks we shall conclude the elementary dis-
cussion of the merely theoretical part of the subject. The follow-
ing chapters require in most cases a knowledge of the infinitesimal
calculus, and many of the questions discussed above will appear
in a new and instructive light by this treatment.
Digitized by
Google
CHAPTER VII.
THE LAW OF LARGE NUMBERS.
SI. A Priori and Empirical Probabilities. — ^In the previous
chapters we limited ourselves to the discussion of such mathe-
matical probabihties, where we, a priori, on account of om*
knowledge of the various domains or complexes of actions, were
able to enumerate the respective favorable and unfavorable
possibilities associated with the occurrence or non-occurrence of
the event in question. " The real importance of the theory of
probability in regard to mass phenomena consists, however,
in determining the mathematical relations of the various proba-
bilities not in a deductive, but in an empirical manner — ^without an
a priori exhaustive knowledge of the mutual relations and actions
between cause and effect — ^by means of statistical enumeration
of the frequency of the observed event. The conception of a
probability finds its justification in the close relation between the
mathematicdl probabilities and relative frequencies as determined in
a purely empirical way. This relation is established by means
of the famous Law of Large Numbers " (A. A. Tschuprow).
To return to our original definition of a mathematical proba-
bility as the ratio of the favorable to the coordinated equally
possible cases, we first notice that this definition is wholly
arbitrary like many mathematical definitions. The contention
of Stuart Mill that every definition contains an axiom is rather
far stretched. In mathematics a definition does not necessarily
need to be metaphysical. A striking example is offered in
mechanics by the definitions of force as given by Lagrange and
Kirchhoff. What is force? " Force," Lagrange says, " is a
cause which tends to produce motion." Kirchhoff on the other
hand tells us that force is the product of mass and acceleration.
Lagrange's definition is wholly metaphysical. Whenever a
definition is to be of use in a purely exact science such as mathe-
matics, it must teach us how to measure the particular phe-
nomena which we are investigating. Thus, to quote Poincaré,
82
Digitized by
Google
Si] A PRIORI AND EMPIRICAL PROBABILITIES. 83
" it is not necessary that the definition tells lis what force really
is, whether it is a cause or the efifect of motion/'
An analogous case is offered in the criticism of a mathematical
probabiUty as defined by Laplace, and the attempts to place
the whole theory of probabilities on a purely empirical basis by
Stuart Mill, Venn and Chrystal. These writers contend " that
probability is not an attribute of any particular event happening
on any particular occasion. Unless an event can happen, or
be conceived to happen a great many times, there is no sense in
speaking of its probabiUty." The whole attack is directed against
the definition of a mathematical probability in a single trial
which definition, evidently by the empiricists, is regarded as
having no sense. The word " sense " must evidently be con-
sidered as having a purely metaphysical meaning. In the same
manner Kirchhoff's definition might be dismissed as having no
sense, since it would seem as difficult to conceive force as a purely
mathematical product of two factors, mass and acceleration, as
it is to conceive the definition of a mathematical probability
as a ratio.
The metaphysical trend of thought of the above writers is
shown in their various definitions of the probability of an event.
Mill defines it merely as the relative frequency of happenings
inside a large number of trials, and Venn gives a similar defini-
tion, while Chrystal gives the following:
" If, on taking any very large number N out of a series of cases
in which an event, E, is in question, E happens on pN occasions,
the probability of the event, E, is said to be p"
Let us, for a moment, look more closely into these statements.
Any definition, if it bears its name rightly, must mean the same
to all persons. Now, as a matter of fact, the vagueness in a
half metaphorical term like " any very large number " illustrates
its weakness. The question immediately confronts us " what is
a very large number? " Is it 100, 1,000 or perhaps 1,000,000?
A fixed universal standard for the value of N seems out of the
question and the definition — although perhaps readily grasped
in a " general way " — can hardly be said to be happily chosen.
Another, and perfectly rigorous definition, is the following one
given by the Damsh astronomer and actuary, T. N. Thiele.
Digitized by
Google
84 THE LAW OF LARGE NUBiBERS. [51
Thiele tells as that " common usage " has assigned the word
probability as the name "for the limiting value of the relative
frequency of an event, when the number of observations (trials),
under which the event happens, approach infinity as a limit."
A similar definition is later on given by the American actuary
R. Henderson, who says : " The numerical measure which has been
universally adopted for the probability of an event tmder given
'circumstances is the ultimate value, as the number of cases is
indefinitely increased, of the ratio of the number of times the
event happens under those circumstances to the total possible
number of times." There is nothing ambiguous or vague in these
definitions. Infinity, taken in a purely quantitative sense, has a
X)erfectly uniform meaning in mathematics. The new definition
differs, however, radically from our customary definition of a
mathematical a priori probability. We cannot, therefore, agree
with Mr. Henderson when he continues " the measure there given
has been universally adopted and this holds true in spite of the
fact that the nde has been stated in ways which on their face differ
widely from that above given. The one most commonly given
is that if an event can happen in a ways and fail in 6 ways all of
which are equally likely, the probability of the event is the ratio
of a to the sum of a and 6. It is readily seen that if we read
into this statement the meaning of the words " equally likely," this
measure, so far as it goes, reduces to a particular case of that given
above."
In order to investigate this statement somewhat more closely,
let us try to measure the probability of throwing head with an
ordinary coin by both our old definition of a mathematical
probability and the definition by Mr. Henderson of what we
shall term an empirical probability. Denoting the first kind of
probability by P(E) and the second by P'(jE) we have in ordinary
symbols
Prø = i
F{E) = lim F{E, v)
«SQO
where the symbol F{E, v) denotes the relative frequency of the
event, E, in v total trials. No a priori knowledge will tell us
offhand if P'(jE) will approach J as its ultimate value. The
Digitized by
Google
52] EXTENT AND USAGE OF BOTH METHODS. 85
two methods are radically different. By the first method the
determination of the nimierical measure of a probability depends
simply on our ability to judge and segregate the equally possible
cases into cases favorable and unfavorable to the event E. By
the second method the determination of the probability depends,
not alone on the segregation and consequent enumeration of the
favorable from the total cases^ but chiefly on the extent of our
observations or trials on the event in question.
52. Extent and Usage of Both Methods. — Before entering into
a more detailed discussion of the actual quantitative comparison
of the two methods, it might be of use to compare their various
extent of usage. In this respect the empirical method is vastly
superior to the a priori. A rigorous application of the a priori
method, as far as concrete problems go, is limited to simple
games of chance. As soon as we begin to tackle sociological or
economical practical problems it leaves us in a helpless state.
K we were to ask about the probability that a certain person
forty years of age would die inside a year, it woidd be of little use
to try to determine this in an a priori manner. Even a purely
deductive process, as illustrated by Bayes's Ride in the earlier
chapters, leads to paradoxical results. Our a priori knowledge
of the complexes of causes governing death or survival is so
incomplete that even a qualitative — not to speak of a quanti-
tative — ^judgment is out of the question. The empirical method
shows us at least a way to obtain a measure for the probability
of the event in question. By observing during a period of a year
an infinite number of forty-year-old persons of whom, after aa
exhaustive qualitative investigation, we are led to believe that
their present conditions as far as health, social occupation, en-
vironments, etc., are concerned are equally similar, we may by
an enumeration of those who died during the year obtain the
desired ratio as defined by P'{E). Of course, observation
an infinite number is practically impossible. An approximate^
ratio may be formed by taking a finite, but a large, number*
of cases under observation. But how large a number? Thisi
very question leads straightforward to another problem, namely
the quantitative determination of the range of variance between
the approximate ratio and the ideal ultimate ratio as defined by
Digitized by
Google
86 THE LAW OF LABGE NUMBERS. [52
the relation
P\E) = Urn F{E, v).
«ssOO
Since it is impossible to make an infinite number of observations
we cannot find the exact value of the range of such variations.
But we may, however, determine the probability that this range
does not exceed a certain fixed quantity, say X, in absolute mag-
nitude. Stated in compact form our problem reduces to the
following form: To determine the probability of the existence
of the following inequality:
Km F{E, tj) - -
^x
where both a and s are finite numbers. This, to a certain extent,
contains in a nut shell some of the most important problems in
probabilities.
The above problem may be solved in two distinct ways. The
first, and perhaps the most logical way, is by a direct process.
This is the method followed by T. N. Thiele in his " Almindelig
lagttagelseslære,"^ published in Copenhagen, 1889, a most
original work, which moves along wholly novel lines. Thiele
distinguishes between (1) Actual observation series as recorded
from observation, in other words statistical data. (2) Theoret-
ical observation series giving the conclusions as to the outcome of
future observations and (3) Methodical laws of series where the
number of observations is increased indefinitely. By such a
process, purely a theory of observations, the whole theory of
probability becomes of secondary importance and rests wholly
upon the theory of observed series, a fact thoroughly emphasized
by Thiele himself. When the author first, in the closing chapters
•of his book, makes use of the word probability it is only because
" common usage " has assigned this word as the name for the
ultimate frequency ratio designated by our symbol lim F{E, v).
v=oo
The problem may, however, be solved in an indirect way,
which is the one I shall adopt. This method, as first consistently
deduced by Laplace, has for its basis our original definition of a
mathematical a priori probability and may be briefly sketched as
follows: We first of all postulate the existence of an a priori
1 English edition, "Theory of Observations," London, 1905.
Digitized by
Google
53] AVERAGE A PRIORI PROBABILITIES. 87
probability as defined, although its actual determination, by a
priori knowledge, is impossible except in a few cases, as, for
instance, simple games of chance, drawing balls from urns, etc.
Denoting such a probability by P(E), or p, we next ask. What will
be the expected number, say a, of actual happenings of the event,
E, expressed in terms of s and p, when we make s consecutive
trials instead of a single trial, and what will be the number of
happenings of E when s approaches infinity as its ultimate value?
If such a relation is found between p, a and s, where p is the
unknown quantity, we have also found a means of determining
the value of p in known quantities. Our next question is —
What is the probability that the absolute value of the difference
between p and the relative frequency of the event as expressed
by the ratio of a to * does not exceed a previously assigned
quantity? Or the probability that
a
^X?
8
Now, as the reader will see later, we shall prove that
lim F{E, v) = P{E) = p.
It must, however, be remembered that this result is reached by a
mathematical deduction, based upon the postulate of mathe-
matical probabilities, and not in the manner as suggested in the
above statement by Mr. Henderson.
It is only after having established such purely quantitative
relations that we are entitled to extend the laws of mathematical
probabilities as deduced in the earlier chapters to other problems
than the simple problems of games of chance.
S3. Average a Priori Probabilities. — In the previous para-
graphs of this chapter, another important matter is to be noted,
namely the assumption that the complex of causes producing
the event in question remains constant during the repeated
trials (observations), or, stated in other words the mathematical
a priori probability remains constant. Under this limitation
the extension of the laws of mathematical probabilities would
have but a very limited practical application. In all statistical
mass phenomena such an ideal state of affairs is rather a very
Digitized by
Google
88 THB LAW OF LABGE NUMBERS. [54
rare exception. K we consider an ordinary mortality investiga-
tion we know with absolute certainty that no two persons are
identically alike as far as health, occupation, environment and
numerous other things are concerned. Thus the postulated
mathematical probability for death or survival diuing a whole
calendar year will in general be different for each person. We
may, however, conceive an average probability of survival for a
full year defined by the relation
Pi + P2 + Pz+ ' " p» Sp
2>o =
9
where pi, p2, Pz, • • • are the postulated probabilities of each
individual under observation. Our task is now to find:
1. An algebraic relation between the average probability as
defined above, the absolute frequency a and the total number of
observations (trials) s, ,^^
2. The same relation when s approaches U as its ultimate value,
3. The probability of the existence of the inequality,
a
Po--
^\
where a denotes the absolute frequency of the occurrence of the
event, s the total number of observations (trials) and X an ar-
bitrary constant.
54. The Theory of Dispersion. — ^As we mentioned before the
empirical ratio a/s represents only an approximation of the ideal
ultimate value of lim F(E, v). If we now make a series of
v=oo
observations (trials) on the occurrence of a certain event E, such
that instead of a single set of observations of s individual ob-
servations we take N such sets, we shall have N relative frequency
ratios:
ai a2 otz Oi2r
Since the ratios are approximations only of the ultimate ratio
they will in general exhibit discrepancies as to their numerical
values and may be regarded as N different empirical approxima-
tions. The question now arises how these various empirical
ratios group themselves around the value of lim F{E, v). The dis-
Digitized by
Google
65] HISTORICAL DEVELOPMENT OP LAW OP LARGE NUMBERS. 89
tribution of the empirical ratios around the ultimate ratio is by
Lexis called " dispersion."
55. Historical Development of the Law of Large Numbers. —
The first mathematician to investigate the problems we have
roughly outlined in the previous paragraphs was the renowned
Jacob Bernoulli in the classic, " Ars Conjectandi/' which rightly
may be classified as one of the most important contributions on
the subject. Bernoulli's researches culminate in the theorem
which bears his name and forms the corner-stone of modern
mathematical statistics. That Bernoulli fully reaUzed the great
practical importance of these investigations is proven by the
heading of the fourth part of his book which runs as follows:
" Artis Conjectandi Pars Quarta, trådens usum et applicationem
præcedentis doctrinæ in civilibus et ceconomicis." It is also
here that we first encounter the terms " a priori " and " a pos-
teriori " probabilities. Bernoulli's researches were limited to
such cases where the a priori probabilities remained constant
during the series or the whole sets of series of observations.
Poisson, a French mathematician, treated later in a series of
memoirs the more general case where the a priori probabilities
varied with each individual trial. He also introduced the technical
term, " Law of Large Numbers " (" Loi des Grand Nombres ").
Finally Lexis through the publication in 1877 of his brochure,
" Zur Theorie der Massenerscheinungen der menschlichen Gresell-
schaft," treated the dispersion theory and forged the closing
link of the chain connecting the theory of a priori probabilities
and empirical frequency ratios. Of late years the Russian mathe-
matician, Tchebycheff, the Scandinavian statisticians, Wester-
gaard and Charlier, and the Italian scholar, Pizetti, have- con-
tributed several important papers. It is on the basis of these
papers that the following mathematical treatment is founded.
In certain cases, however, we shall not attempt to enter too
deeply into the theory of certain definite integrals, which is
essential for a rigorous mathematical analysis, but which also
requires an extensive mathematical knowledge which many of
my readers, i)erhaps, do not possess. To readers interested in
the analysis of the various integrals we may refer to the original
works of Czuber and Charlier.
Digitized by
Google
CHAPTER VIIL
INTRODUCTORY FORMULAS FROM THE INFINITESIMAL
CALCULUS.
56. Special Integrals. — In the following chapters we shall
attempt to investigate the theory of probabilities from the stand-
point of the calculus. Although a knowledge of the elements
of this branch of mathematics is presupposed to be possessed
by the student, we shall for the sake of convenience briefly
review and demonstrate a few formulas from the higher analysis
of which we shall make frequent use in the following paragraphs.
All such formulas have been given in the elementary instruction
of the calculus, and only such readers who do not have this
particular branch of mathematics fresh in memory from their
school days need pay any serious attention to the first few
paragraphs.
57. Wallis's Expression for x as an InjBnite Product. — ^We wish
first of all to determine the value of the definite integral:
Jn= of ^ sin'' xdx, (1)
under the assumption that n is a positive integral number. This
integral is geometrically equal to the area between the x axis,
the axis of y, the ordinate corresponding to the abscissa Jx and
the graph of the function y = sin~ z. Letting u^ = D^u = sin z,
tj = sin*""^ z, we get by partial integration:
Jn — — COS z sin**"^ zj ^ ^^+ of' '^ cos a:(n— 1) sin**^ z cos zdz. (2)
If we substitute the upper and lower limits in the first term on
the right hand side of the above expression for J» this term
reduces to 0, assuming n > 1. Thus we have:
t/n = (w — 1)(J ' ^ sin**"^ x-cos^ zdz.
90
Digitized by VjOOQ IC
S7] WALLIS'S EXPRESSION OF ir AS AN INFINITE PRODUCT. 91
Putting cos^ x = 1 — sin^ x, we get:
Jn = (n - 1) J^^' sin«-2 xdx- {n- 1)J'^^ sin'* xdx. (3)
The last integral is, however, equal to Jn and the first integral
is, following the notation from (1), equal to Jn-2. We shall
therefore have:
or
nJn = (n — 1) Jn^. (4)
Replacing n by n — 1, n — 2, n — 3, • • • successively we get:
wJn = (n — 1) J»-2,
(n - 1) J„^i = (n - 2) J^8,
(n - 2) J^2 = (n- 3) J^4,
According as n is even or uneven we shall have one of the
following equations at the bottom of the recursion formula:
Jo = (J ' ^sin° xdx = ^' ^dx = ^tt,
or
Ji = ^' ^ sin arÆc = — cos xj^ ^ = 1. (5)
If, for even values of n, we let n = 2m, and, for uneven values,
71 = 2m — 1, we get finally the following recursion formulas:
2mJ2m= (2m— l)t/2»»-2, (2m— 1) J2w-i= (2m— 2) J2W-8,
(2m - 2)J2„^2= (2m-3)J2«-4, (2m-3)J2m-3= (2m-4) J2»»-5,
2J2 = I'ir, 3J8 = 2X 1.
Successive multiplication of the above equations gives us
finally:
^ (2m- I)(2m-3)>>>1 -r
•'^ 2m(2m-2)-..2 '^ 2'
_ (2m- 2)(2m- 4)-'>2 ^^^
J2m^i - ^2m - l)(2m - 3). • -3 *
We may now draw some very interesting conclusions from the
Digitized by
Google
92 FOBMULAS FBOM THE INFINITESIMAL CALCULUS. [58
above equations. Both integrals represent geometrically areas
bounded by the graphs of the functions:
y = sin^ X and y = sin*^^ x respectively.
The difference of the ordinates of these graphs, namely:
(sin a: — 1) sin^*^^ x
is evidently decreasing with increasing values of the positive
integer n, since sin x lies between and + 1 and sin^**"^ x ap-
proaches the value except for certain values of x. The larger
we select m the less is the difference of the two areas and the
ratio will therefore approach 1, or the expression
(2m — 2)(2m — 4)- "2 , (2m— l)(2m — 3) "»3 _ t
(2m - l)(2m - 3)- • -3 ^ 2m(2m - 2). • -2 "2*
Hence:
ir ,. 22.42.62...(2m-2)2.2m
:; = lim
2 ;;:::; p-32-52- • .(2m - 3)2(2m - if*
Multiplying with 2^'4?'&* • • (2m - 2f we get:
T ,. 2^M(^ - 1).']* ,. 2^(m.O« rr;
2 »«« [(2m - l)/f ^^ (2m/) V2m '
This is the formida originally discovered by the English
mathematician, John Wallis (1616-1703), and by means of which
IT may be expressed as an infinite product.
58. De Moivre — Stirling's Formula. — We are now in a position
to give a demonstration of Stirling's formula for the approximate
value of n! for large values of n. A. de Moivre seems to have
been the first to attempt this approximation. In the first edition
of his "Doctrine of Chances" (1718) he reaches a result, which
must be regarded as final, except for the determination of an
unknown constant factor. Stirling succeeded in completing this
last step in his remarkable "Methodus Differentialis" (1738).
In the second edition of "Doctrine of Chances" (1738) de Moivre
gives the complete formula with full credit to Stirling. He
mentions as his belief that Stirling in his final calculation possibly
has made use of the formula of Wallis. The demonstration by
the older English authors is rather lengthy and much shorter
Digitized by
Google
58] DB MOiVBE — Stirling's fobmula. 93
methods have been devised by later writers. Most authors
make use of the Eulerian integral of the second order by which
any factorial may be expressed by a gamma function:
T(n + 1) = J^aTe-^dx = nl.
Another method makes use of the well-known Eider's Summation
Formula from the calculus of finite differences. This method is
of special interest to actuarial students, who frequently use the
Eulerian formula in the computation of various life contingencies.
For the benefit of those interested in this particular method we
may refer to the treatises of Seliwanoff and Markhoff, two
Russian mathematicians.^
The Italian mathematician, Cesaro, has, however, derived
the formula in a much simpler manner.^
Cesaro starts with the inequalities:
/ l\n+l/2 I
From a well-known theorem from logarithms we have:
* ^^^' n 2n+l^ 3(2n + 1)»"*" 5(2n + 1)*"^ ' * '*
which also may be written as follows:
iy=(n+|)log.(l + ^)=l + 3(^^!^.^), + ^^^^,+ -..
If all the coefficients 3, 5, • • • are replaced by the number 3,
we obtain a geometrical series. The summation of this infinite
series shows that
1<N<1+ ^
or
If we let
12w(n+l)'
^Seliwanoff, "Lehrbuch der Differenzenrechnung/' Ldpzig, 1905, pages
59-60; Markhoff, " Differenzenrechnung/' Leipzig, 1898.
* Cesaro, " Corso di analisa algebrica,'' Torino, 1884, pages 270 and 480.
Digitized by
Google
94 FOBMULAS FBOM THE INFINITESIMAL CALCULUS. (58
then
%H-i e
Dividing the quantities in (I) by e we have:
l<-^<e^SMj^)\ (II)
The exponent of e may be written as follows:
11 1
12n(n + 1) 12n 12(n + 1) *
Making use of this relation (II) may be written in the following
form:
Denoting the quantity: iin^e"^'^^ by t^', we shall have two mon-
otone number sequences:
These two sequences show some very remarkable features.
With increasing values of n the values of Un decrease, or the
sequence is a monotone decreasing number sequence. The
values of t^' become larger when n is increased and form there-
fore a monotone increasing number sequence. But any member
of this latter series satisfies, however, the inequaUty
Since both number sequences are situated in a finite interval
it follows from the well-known theorem of Weierstrass that they
both have a clustering point, i. e., a point in whose immediate
region an infinite number of points of the sequence are located.
Denoting this point of cluster by a, we have here an increasing
and a decreasing monotone sequence which both converge
towards a, or:
lim tin' = lim t^ = a.
nsoo nsoo
This relation may be illustrated by the accompanying diagram:
If we now let lim t^ = lim iLn'e"^^^^ = a, then we shall have
Digitized by
Google
58] DB MoivRE — Stirling's formula.
for every finite value of n:
where a = v^^e-^'^ {0 < $ < 1).
95
T^ Tt-/ ^ t t
This gives us finally the following expression forn!:
(HI)
In this expression we need only determine the unknown
coefficient a. The formula of Wallis gives immediately:
hm-^^ f=- = lim —j=^=^ Vir/2.
»=00 (2n)IV2n »-oo(2w)lV2n
Substituting in this latter expression the value for factorials
as found in (III) and neglecting the quantity: &/12n, we have
after a few reductions:
lim
an
= V7i72, or a = V2ir,
V2w(2n)
from which we easily obtain De Moivre-Stirling's Formula in its
final form:
n\ = V2?.n«^+i/2.e-«.
This remarkable approximation formula gives even for com-
paratively small values of n surprisingly accurate results. Thus
for instance we have:
101 = 3,628,800; We'-^'^-JT^ = 3,598,699.
Digitized by
Google
CHAPTER IX.
LAW OF LARGE NUMBERS. MATHEMATICAL DEDUCTION.
59. Repeated Trials. — ^Let us consider a general domain of
action wherein the determining causes remain constant and
produce either one or the other of the opposite and mutually
exclusive events, E and E, with the respective a priori prob-
abilities p and g (g = 1 — p) in a single trial. The trial (observa-
tion) will, however, be rei)eated s times with the explicit assimip-
tion that the outward conditions influencing the different trials
remain unaltered during each observation. The simplest ex-
ample of observations of this kind is offered by repeated drawings
of balls from an urn containing white and black balls only, and
where the ball is put back in the urn and mixed thoroughly with
the rest before the next drawing takes place. We keep^now a
^*' record of the repetitions of the opposite events, E and E diuing
the s trials, irrespective of the order in which these two events
may happen. This record must necessarily be of one of the
following forms:
E happens s times, E times,
E " s-l " El "
E " ^-2 • " E2 "
E " " Es
In Chapter IV, Example 17, we showed that the probabilities
of the above combinations of the two events, E and E, were
determined by the expansion of the binomial
(p + qy
96
Digitized by
Google
61] SIMPLE NUMERICAL EXAMPLES. 97
The general term
^,py (a + fi = s)
is the probability P(E''E^) that E will happen a and E j8 times
in the 8 total trials. Each separate term of the binomial expan-
sion of (p + g)*, represents the probability of the happening of
the two events in the order given in the above scheme.
60. Most Probable Value. — ^In dealing with these various
terms^ it has usually been the custom of the English and French
mathematicians as well as many German scholars to pay par-
ticular attention to a special term^ the maximum term^ which
generally is known as the "most probable value" or the "mode.'*
Russian and Scandinavian writers and the followers of the Lexis
statistical school of Germany have preferred to make another
quantity known as the "probable" or "expected value," the
nucleus of their investigations. Although it is our intention to
follow the latter method, we shall discuss first, briefly, the most
probable value. Two questions are then of special interest
to us:
(1) What particular event is most probable to happen?
(2) What is the probability that an event will occur whose
probability does not differ from that of the most probable event
by more than a previously fixed quantity?
Neither of the two questions offers any particular principal
difficulties from a theoretical point of view. When regarding
the probability P(JS*£^), which we shall denote by T, as a func-
tion of the variable quantity, a, T evidently will reach a maximum
value for a certain value of a, (j8 = * — a), and we need only
determine the greatest term in the above binomial expansion.
In order to answer the second question we have only to pick
out all the terms which are situated between the two fixed limits.
Their sum is then the probability that those two limits are not
exceeded.
61. Simple Numerical Examples. — ^When « is a comparatively
small number the actual expansion may be performed by simple
arithmetic. We shall, for the benefit of the student, give a
simple example of this kind.
8
Digitized by
Google
98 LAW OF LARGE NUMBERS. [61
A pair of dice is thrown 4 times in succession, to investigate
the chance of throwing doublets.
In a single throw the probability of getting a doublet is
p = ^;fg=^j. Expanding ( ^ + c ) by means of the bi-
nominal theorem wegetfgj +*(6)(6)'^^\6) vb)
"^^Vfijvfi) "^(fij* Each of the above terms represents
the probability of the occurrence_of the various combinations of
doublets {E) and non-doublets (£), and it is readily seen that
the event of 1 doublet and 3 non-doublets, represented by the
^ term ^f^jf^l = .3858, has the greatest probability. In
other words it is the most probable event.
Let us next repeat the trial 12 times instead of 4. The 13
possible probabilities for the various combinations of doublets
and non-doublets will then be expressed by the respective terms
in the expression
(Mr
The 13 members have as their common denominator the quantity
2,176,782,336 and as numerators the following quantities: 1, 60,
1,650, 27,500, 309,375, 2,475,000, 14,437,500, 61,875,000, 193,-
359,375, 429,687,500, 644,531,250, 585,937,500, 244,140,625,
which now shews that the most probable combination is the one
of 2 doublets and 10 non-doublets, having a numerical value
equal to .2961.
A further comparison will show that the most probable
event in the second series had the probability .2961, whereas
.3858 was its value in the first series. In other words, the prob-
ability decreases when the trials (observations) are increased.
This is due to the fact that the total number of possible cases
becomes large with the increase of experiments.
Another question which presents itself, in this connection,
is the following: What is the probability that an event will occur
Digitized by
Google
I \
62] VALUE IN A SERIES OF REPEATED TRIALS. 99
whose probability does not differ from the most probable value by
more than a previously fixed quantity? Let us suppose we were
asked to determine the probability that a doublet does not occur
oftener than 5 times and not less than 1 time in 12 trials. This
probability is found by adding the numerical values of the prob-
abilities as given in the binomial expansion from the term
containing p = ^ to the power 6, to p to the fii^t power or
14,437,500 + 6J87,5003t- 193,359,375 + 429,687,500
^ + 644,531,250 + 585,937,500
2,176,782,336
62. The Most Probable Value in a Series of Repeated Trials.
— In the examples just given we determined the probability for
the happening of the most probable event in a series of s observa-
tions by a direct expansion of the binomial (p +g )*. This
may be done whenever * is a comparatively small number. But,
when 8 takes on large values, this method becomes impracticable,
not to say impossible. Suppose that s = 1,400, then the actual
straightforward expansion (p + qY^ would require a tremen-
dous work of calculation which no practical computer would be
willing to undertake. We must therefore in some way or other
seek a method of approximation by which this labor of calcula-
tion may be. avoided and try to find an approximate formula by
which we are able to express the maximum term in a simple
manner, involving little computation and at the same time
yielding results close enough for practical as well as theoretical
purposes. Jacob Bernoulli in his famous treatlsfe "Ars Conjec-
tandi" was the first mathematician to solve this problem^
Bernoulli also gave an expression for the probability that the
departure from the most probable value should not exceed pre-
viously fixed limits. The method, however, was very laborious
and the final form was fii^t reached by Laplace in "Théorie dea»
Probabilités."
We saw before in Chapter IV that the general term
Digitized by
Google
100 LAW OF LABGB NUMBBBS. [62
in the binomial expansion (p + q)' represented the probability
that an event, E, will happen a times and fail j9 times in 8 trials,
where p and q were the respective probabilities for success and
failure in a single trial. The exponent a may here take all posi-
tive integral values in the interval (0, «), including both limits.
The question now arises, which particular value of a, say an,
will make the above quantity a maximum term in the expansion
of the binomial? If On really is this particular value, then it
must satisfy the following inequalities:
■1„**H.
<a,+ l)103n-l)^ ^ =a»l/3„I
(I) (II)
(III)
Dividing (II) by (III) and (II) by (I) we obtain the following
inequalities:
Otn q " Pn p-
v^hich also may be written
03n + l)p ^ qocn and (on + l)q ^ jSnp.
The following reductions are self evident:
(* — On + l)p ^ an(l — p) OT Sp + p'^On,
-and
ioCn + 1)3 ^ (* — Cen)p OT Onq + Onp'^Sp — q OT Otn'^SP'- q.
From which we see that a« satisfies the following relation:
pa— q^On^ps + p.
Since p + q = 1, we notice that On is enclosed between two
limits whose difference in absolute magnitude equals imity.
The whole interval irr which a« is situated being equal to imity,
and since On must be an integral number, this particular cxn. is
determined uniquely as an. integral positive number when both
ps -^ q and ps + p are fractional quantities. If jw — g is an
in+orpi.oi nimiber ps + p will also be integral, and a» had to be a.
Digitized by
Google
63] APPROXIMATE CALCULATION OF THE MAXIMX7M TEBM. 101
fractional number in order to satisfy the above inequality.
Since by the nature of the problem a» can take positive integral
values only, the binomial expansion of (p + 9)* must have two
terms which are greater than any of the rest. Dividing both
sides of the inequality by s, we shall have
Since both p and q are proper fractions, both pfa and q/a are less
than 1/*. We may therefore safely assume that the highest pos-
sible difference between the two quotients Onfs and fijs and the
probabilities p and q will never exceed 1/*. Now if * is a very
large number this quantity may be neglected, and we may
therefore write pa = On and qa = fin*
Substituting these values in our original expression for the
general term of the binomial expansion we get as the maximum
number:
al
^"^^ {sp)l{aq)lP"'^''
63. Approximate Calculation of the Maximum Term, T^^ —
When the trials are repeated a large number of times the straight-
forward calculation of the maximum term becomes very laborious.
The only table facilitating an exact computation is in a work
**Tabulanmi ad Faciliorem et Breviorem Probabilitatis Com-
putationem Utilium, Enneas," by the Danish mathematician,
C. F. Degen. This table, which was published in 1824, gives the
logarithms to twelve places for all values of nl from n «= 1 to
n = 1,200. Degen's table is, however, not easily obtained, and
even if it were, it would be of little or no value for factorials
above 1,200 1. Our only resort is therefore to find an approximate
expression for the above value of nl This is most conveniently
done by making use of Stirling's formula for factorials of high
orders. We have
jji = 5*^-i/vV25r,
(*g)I= {aqy^^f^e'*^^.
Digitized by
Google
102 LAW dF LABGE NUMBEBS. [64
Substituting the above values in the expression s!/{(8p)! (sq)!)
we get
p.p+l/2^«+l/2^2w'
Hence we have
r« =
which reduces to
'yl2Tspq
as an approximate value for the maximum term.
Tchebycheff's Theorems. — Despite all that has been said about
the most probable value, its use is somewhat limited, and it
might well, without harm, be left out of the whole theory of
probabilities. Just because an event is the most probable it
does by no means follow it is a very probable event. In fact the
expression ( V27r5pg)~^ which for large values of s converges
towards zero shows that the most probable event in reality is a
very improbable event. This statement may seem a little
paradoxical; but it is easily understood by realizing that the
most probable event is only a probability for a possible combina-
tion among a large number of equally possible combinations of a
different order.
Instead of finding the most probable event it is more important
in practical calculations to determine the average number or
mean value of the absolute frequencies of successes. In Chapter
V we pointed out the close relation between a mathematical
expectation and the mean value of a variable. This relation is
used by the Russian mathematician, Tchebycheff, as the basis
of some very general and far-reaching theorems in probabilities,
by means of which the Law of Large Numbers may be established
in an elegant and elementary manner.
64. Expected or Probable Value. — ^In Chapter V we defined
the product of a certain sum, s, and the probability of winning
such a sum as the mathematical expectation of s. It is, however,
not necessary to associate the happening of the event with a
monetary gain or loss, in fact it serves often to confuse the
reader and we may generalize the definition as follows. // a
Digitized by
Google
64] EXPECTED OR PROBABLE VALUE. 103
variable at may assume any of the values ai, a2^ os • • • a« each with
a respective probability of existence fp{ai) (i = 1, 2 • • • *) and such
that '2<p(ai) = 1, then we define:
^ai<p{ai) = e(ai)
as the expected value of ai.
Some writers use also the term probable value instead of
expected value. In other words the expected value of a variable
quantity, a, which may assume any one of the values ai, a2* • •«,
is the sum of the products of each individual value of the variable
and the corresponding probability of existence of such value.
Suppose we now have two opposite and complementary events
E and E for which the probabiUties of happening in a single
trial are equal to p and g = 1 — p respectively. When the
trials are repeated s times the probabilities of E happening s
times^E no times, of E happening ^ — 1 and E once, ot E s — 2
and E 2 times and so on, may be expressed by the individual
terms of the expansion:
(p + qy>
where the general term expressing the occurrence of JS a times
and of £ (« — a) times is:
<Piot) = y^jp^q'^,
which is also the probability of the existence of the frequency
number a. The variable in the binomial expansion is a, which
may assume all values from Oto s inclusive.
We now first of all proceed to find the expected value — or the
mathematical expectation — of the following quantities:
a, [a — e{a)] and [a — e{a)f.
We shall presently show the reason for the selection of the
above expressions, which perhaps may appear at the present,
somewhat puzzling to the student.
In mathematical symbols the expected values of the above
quantities are expressed as follows*
e{a) = Sa:^(a), e[a — - e{a)] = S[a — e(a)]<p{a)
and
e[a - e{a)Y = S[a - e{a)Y<p(a)
and the sununation is to take place from a = and to a = s.
Digitized by
Google
104 LAW OF LABGB NUMBERS. [65
6S. Summatioii Method of Laplace. The Mean Error. —
The analytical difficulty lies in the summation of the expressions
as given above. Laplace was the first to give a compact expres-
sion for the different sums in a simple and elegant manner.
By the introduction of the parameter t Laplace writes:
^(a)=(p+9)' = s(^)p*g"
as
<p(ta) = {tp + g)- = S (^) (iprq^.
Differentiating with respect to t, which it must be remembered is
introduced as an auxiliary parameter only, we have:
ip\ta) = apitp + g)*-i = Sap (* ) (tp)^'q^.
Letting t assume the special value 1 the above sum becomes e (a).
or
e(a) =2a ( * ) p'q"^ = ^(p + g)*"^ = sp. (L)
We might, however, have obtained the same result in a much
shorter manner by the following consideration. The expecta-
tion for a single event among the s events is equal to p. Since
all the events are independent of each other, it follows from the
addition theorem that the complete expectation of the total s
cases is equal to sp.
We next proceed to determine the expression: e[a — e{a)] or
the expected value of the differences between the '<5onstant,
e{a) = sp and the individual values 1, 2, 3, • • •, « which a may
assume in the binomial expansion.
The difference a — e{a) is known as the departure or devia-
tion from the expected value, some of these deviations will be
positive, namely all the values situated to the right of the maxi-
mum term, which also is the most probable term in the expansion
(p + q)', while the a's situated to the left of the maximum value
of a will be less in magnitude than the largest a = sp and the
deviation will therefore be negative. On accoimt^of the sym-
metrical form of the binomial expansion we may expect an
Digitized by
Google
65] SUMMATION METHOD OF LAPLACE. 105
equal number of positive and negative deviations which, taken
two and two at a time, are equal in absolute magnitude. The
algebraic sum of all the deviations may therefore be expected
to be equal to zero. We shall, however, in a rigidly analytical
manner prove that this is actually so. We have
e[a - e{a)] = S[a - e{a)]<p{a) = Sa^(a) - i:e{a)<p(a) ^
= Xa<p(a) — 8pS<p(a)\
The first term in this last expression we found, however, to be
equal to e{a) = sp, and we have finally:
e[a — e(a)] = sp — sp = 0.
By squaring the quantity, a — e{a), we get o? — 2ae{a) +
[e{a)]^, which is always positive no matter if the above difference
is negative.
As a preliminary step we shall find
Introducing the auxiliary parameter, t, we get:
^[l^)(tprq^={tp + qy.
The first derivative with respect to t is:
Spa ( * ) «p)-^g— = sp{tp + q)^\
Multiplying both sides of the equation by tp, we have:
Spa(<p)«g— (* ) = atp'itp + q)^K
Differentiating we get:
V«' (^) (^P)-'?*^ = ^P'itp + 9)*"' + sis - l)jH{tp + q)^.
Dividing through with the constant factor p and letting < = 1
we have:
Zo^f Mp»g'- = ^p2 + ^(l_p) = ^p2 + 5p?.
The expression on the left side is, however, nothing less than the
algebraic sum of Xo?<p(a) or simply e{o?). This leaves the final
result:
Digitized by
Google
<K '
106 LAW OP LARGE NUMBERS. [66
We have now:
[a - e{a)Y = o^ - 2ae(a) + [e{a)]^
from which it follows:
e[a - e{a)]^ = s^f + »pq - 2s^f + s^f -^ spq.
Denoting this latter quantity by the symbol [€(«)]* we have:
[€(a)p = e[a — e{a)]^ = apq, or €(a) = '^Ispq. (II.)
The quantity €(a) or simply € is conunonly known as the mean
error of the frequency number a in the Bernoullian expansion«
The mean error is one of the most useful functions in the theory
of probabilities and furnishes one of the most powerful tools of
the statistician.
66. Mean Error of Various Algebraic Expressions. — ^We next
proceed to prove some general theorems connected with the
mean error. The mean error of the simi of two observed vari-
ables, a and ^, is given by the formula:
Proof: Let e{a) = Sa^(a) and e(fi) = S/3^(/3)
é{a) = S[a - e{a)Ma) and e^{p) = S[/3 - e(i8)]V(i8)
be the respective expressions for the probable values and the
mean errors of a and j3 where of course S^(a) = 1 and S^(j8) = 1.
Now <p{ay) is the probability for the occurrence of the special
value a^ of the variable values, in the same way ^08^) is the
probabiUty for the occurrence of j8^. If a and j3 are i ndependen t
^^ of each other, then according to the multiplication theorem,
<p(ay)\l/(fif^) represents the probability for the simultaneous
occurrence of ay and j8^ as well as the probabiUty of the occurrence
of the difference: or^ + j8^ — e{a) — e(fi), since the probable
values e{a) and e(fi) are constant quantities independent of
either a or j3.
If € denotes the mean error of a + j3 then it follows from the
definition of € that é = SS[a + /3 - e{a) - e{fi)fip{a)^{fi) where
the double simmiation is to take place for all possible values of
the variables a and j8.
The above expression may be written as:
Digitized by
Google
MEAN ERROR OF VARIOUS ALGEBRAIC EXPRESSIONS. 107
or
é = SS[a - e(a)]V(a)^(i8) + 2SS[a - e(a)][i8 ,
- em<p{cL)rP{ff) + SS[i8 - 6(i8)]V(^)^(i8).
A mere inspection will satisfy that the first and the last terms of
this expression equals é{a) and é(fi) respectively. The fii^t
term may be written as follows:
S[a - 6(a)]V(a)S^(i8) = t\a)
since S^(j8) = 1. The same also holds true for the last term.
With regard to the middle term we found before that
e[a — e{a)] = 0.
Hence it follows by mere inspection that this term becomes 0.
Thus we finally have:
é{a + j8) = é{pt) + e^OS) or €(a + jS) = Vé^OS) + €\a).
Since the middle term is always 0, it follows a fortiori
also that
€{ka) = i€(a),
where h is a constant. This gives us the following theorems:
The mean error of the sum or of the difference of two quantities
is equal to the square root of the sum of the squares of each
separate mean error. The mean error of any quantity multiplied
by a constant is equal to this same constant multiplied by the
mean error of the quantity. (See Appendix.)
The above theorems may easily be extended to any number of
variables: a, j3, 7 • • • so that in general we have
€(a + i3 + 7---) = V€V) + e^(i8) + €'(7)+---.
We shall later make use of this formida by a comparison of
the different rates of mortality among different population groups.
So far we have computed the mean error for the absolute
frequencies of a, and the quantity Varpg was compared with the
most probable number of successes sjp. But it may also be useful
to know the mean error of the relative frequencies. This calcula-
tion is performed by reducing the mean error of the absolute
Digitized by
Google
108 LAW OF LABGE NX7MBEBS. [67
frequencies to the same degree as these absolute frequencies are
reduced to relative frequencies. We saw before that e{a) = sp.
The relative frequency of the probable value is e(a)ls = sp/s = p.
The mean error of p therefore is
The following remarks of Westergaard are worthy of note:
"When a length is measured in meters and this measure may be
effected with an uncertainty of say 2 meters, the length in
centimetres is then simply found by multiplication by 100 and
the uncertainty is 200 cm. When we wish to find the mean
error of p instead of sp we only need to divide the mean error
'ylspq by a, which gives "^Ipq/s."
The same result is also easily obtained from the formida
€{ka) = k€{a)
when we let k = 1/*.
67. TchebycheflPs Theorem. — ^Tchebycheff's brochure ap-
peared first in Liouville's Journal for 1866 under the title *• Des
Valeurs Moyennes." A later demonstration was given by the
Italian mathematician, Pizetti, in the annals of the University
of Geneva for 1892. The nucleus in both Tchebycheff's and
Pizetti's investigations is the expression for the mean error:
etf) = 2K - e({)]Va). (1)
The variable { may be of any form whatsoever, it may thus for
instance be the sum of several variables: a, P,y • • • while <p{Q
is the ordinary probability function for the occurrence of f . Let
us denote the difference: fr — ^(fj) by Vr{r = 1, 2, 3 • • • «). We
may then write the above expression for €(f) as:
<P(^)'j2+ <P(^)^2+ «'(&)^+ ••• «'«')^2= ^ (2)
where a is an arbitrarily chosen constant, but always larger than
€({ ) in absolute magnitude. If we, in the above equation, select
all the t^'s which are larger than a in absolute magnitude together
with their corresponding probabilities, <p{^) and denote all
Digitized by VjOOC^ IC
68] tchebychbff's theorem. 109
such quantities by < t", /", • • • and ^(f)', ^(0", ^({)'", • • •
respectively, we have evidently:
-^-+— ^— +— 1^— +-<-^ (3)
For any one of these different v's which is larger in absolute
magnitude than a
or
from which it follows a fortiori:
<p{0' + <P(&" + • • • = S^»«) < -¥ . (3a)
Of
In this latter inequality, S^*'(f) is the total probability for the
occurrence of a deviation from e(J) larger than a in absolute
magnitude.
Let now Pt be the probability that the absolute value of
the mean error is not larger than a; then 1 — P^^ is the total
probability that the mean error is larger than a. We have thus
from the inequality (3a)
1-Pr<^ or Pr>l-^. (4)
Let also a = X€({). We then have by a mere substitution in the
above inequality:
Pr>l-^2- (5)
This constitutes the first of Tchebycheff's criterions which says:
The probability thai the absolute value of the difference \ a — e(a) \
does not exceed the mean error by a certain multiple, X, {\> 1) is
greater than 1 — (1/X^.
Now we made no restrictions as to the variable, J, which may
be composed of the siun of several independent variables, a, j9,
7, • • •• We saw before that
e\a + P + y+ • • •) = ^(«) + c^C^) + €^(7) + ...
Tchebycheff's criterion may therefore be extended as follows:
The Tchebycheffian probability, Pt, that the difference | or + j8 + 7
+ • • • — e{a) — e(fi) — e{y) — • • • | vyiU never exceed the mean
error eby a certain multiple, X > 1, w greater than 1 — (1/X^).
Digitized by
Google
110 LAW OF LABGE NUMBEBS. [69
68. The Theorems of Poisson and Bernoulli proved by fhe
application of fhe Tchebycheflian Criterion. — Bernoulli in hb
researches limited himself to the solution of the problem in which
the probabiUties for the observed event remained constant during
the total number of observations or trials. Poisson has treated
the more general case, wherein the individual probability for the
happening of the event in a single trial varies during the total s
trials. This may probably best be illustrated by an urn schema.
Suppose we have s urns Ui, I/2, • • • Ua with white and black
balls in various numbers. Let the probability for drawing a
white ball from the urns Ui, U2, • • • U9 in a single trial be
Ply P2f ' " Pa respectively, qi, q2, '" 9a the chances for drawing
a black ball in a single trial. If a ball is drawn from each urn,
what is the probability of a drawing a white and s — a black
balls in s trials? It is easily seen that the Bernoullian Theorem
is a special case when the contents of the s urns and the respective
probabilities for drawing a white ball in a single trial are the
same for all urns.
69. Bernoullian Scheme. — ^We shall now show how the Tche-
bycheffian critierions may be used in answering the question
given above. First of all we shall start with the simpler case
of the Bernoullian urn-schema. Here the probability for drawing
a white or a black ball from each of the s urns in a single trial is
p and q respectively. The square of the mean error in a single
trial is pq. From the formulas in § 66 it then follows:
e^ = €1^+ €2^+ ' • • ^ pq + pq+ pq+ • • • * times = apq
or
€ = ^Ispq.
While the above expression gives us the mean error of the absolute
frequency of the variable a, the relative frequency of a to the
total number of trials, s, is given as
We now ask: What is the total probability that the absolute
deviation of the relative frequency a/s from its expected value
sp/8 = p never becomes larger than X times the mean error.
Digitized by
Google
70] poisson's scheme. Ill
€ = ^Ipq/sl Letting X = ^/t and using the symbols Pt for
this particular probability, we have according to Tchebycheff's
criterion:
Pt>1- IM or Py > 1 - f/s.
Since the mean error is equal to ^Ipqfs we have:
X€- ^ •
The answer to our question above follows now a fortiori as
follows:
The total probability that the absolute deviation of the relative
frequency from the postulated a priori probability, p, never
exceeds the quantity, '>Ipq/t, is greater than 1 — (fi/s).
By taking t large enough we may reduce '^/t (where pq is
a fraction whose maximum value never can exceed 1 -?- 4,) below
any previously assigned quantity, 8, however small. If, for
instance, we choose the value .0001 for 8, we may rest assured
that '>lpq/t will be less than 8 when we take t larger than 5000.
But no matter how large t is, so long as it remains a finite number,
by letting * = oo as a limiting value, i^/s will simultaneously
approach as a limiting value. From the deductions thus
derived we are now able to draw the following conclusions:
1) By letting s == <x> as a limiting value, the probahility, Pt,
that the absolvte difference between the relative frequency ajs and the
postulaied a priori probability, p, never becomes greater than ^^Ipq/t
approaches 1 or certainty as a limit.
2) By choosing the quxintity, t, which is less than lim Var, suffl-
ciently great, we may bring 'spqjt below any previously assigned
quantity, 8, or make the difference between p and ajs as small as we
please.
From these conclusions we obtain a fortiori the following
lim- = p.
This constitutes the essential features of the Bernoullian Theorem.
70. Poisson's Scheme.-r-Let pi denote the postulated prob-
ability for success in the first trial, p2 in the second, pz in the
Digitized by
Google
112 LAW OF LARGE NUMBEBS. [70
third, etc., and let furthermore qi, ft, g«, • • • be the respective
probabilities for the corresponding failures. If the trial (observa-
tion) is repeated 8 times we obtain the following values for the
probable or expected value of the frequency for successes e{a)
and the mean error c
e{a) = pi + Pi + Pi+'-p, = :Spi,
€ = Vpigi + P2ft + Ms + • • • ?•?• = ^Pt^i (i = 1, 2, 3, • • -s)
If by po and qo we denote the arithmetic mean or the average
value of the 8 p's and 8 q's, such that
Po = (3) —
^ qi + q2 + qz+ --g, ...
and assume that po and 90 denote the constant probabilities
during each of the 8 trials (observations), we should according
to the Bernoullian Theorem have:
eias) = 8po (5) _
^{ocb) = "Jspoqo (6)
where ub stands for the absolute frequency in a Bernoullian
series.
An actual comparison of (1) and (5) and (3) shows that:
e(ap) = eias) (7)
where ap is the symbol for the absolute frequency in a Poisson
series. In other words: If the s trials had been performed with
constant probability for success equal to po instead of with
varying probabilities pi, pa, • • • p«, the expected or probable
value would be the same for the Bernoullian and Poisson scheme.
With regard to the mean error we find, however, after a littie
calculation,
eAa) = e/ia) - S(p, - po)^ (8)
The expression for the mean error in Poisson's Theorem is of
the following form
€p = Vpigi + paft + psft + • • 'Piqi = VSpi^i (t = 1, 2, 3- • -8).
Digitized by VjOQQIC* '
-_ Å
70] poisson's scheme. 113
Now 2>»9i may be transformed as follows:
Writing
Vi = Po + (p» — Po)
qi = 9o — (p» — Po)
and multiplying we obtain:
Vifli = Pogo - (P< — Po)(po - ?o) — (p» — Po)*,
and summing up for all values of i from i = 1 to t = * we have:
^p = «pogo - S(p» -y po)^ = €i^ - S(p» - Po)^
As (p» — po)* always is a positive quantity, it is readily seen that
the mean error in a Poisson scheme is always less than the mean
error in the corresponding Bernoullian series.
Writing c as follows:
€= Vpiji + P2g2 + • • • + p«?«
= v;i^
+Pa Pi^ + h p«^
and letting X = V*/^, we have according to Tchebycheff's The-
orem the following rule: The probability Pt that the relative
frequency remains inside the limits:
Pi + pg + "' V* '^ /q^\ ^ Pi +Pi+ •" +P>
8 t A*/ 9
+ P2 + " • + p< j?i^+pa^ + ' " + p<'
^r / P1 + P2 +
is greater than 1 — (1/X^) or 1 — (<^/«).
By taking t sufficiently large and by letting s approach infinity
as a limiting value the last term in the above difference, namely
the average probability, po> and X times the mean error, becomes
smaller than any previously assigned quantity, 5, however small,
while Pt at the same time will approach 1 as a limit.
From this it now follows:
When an infinite number of trials is made on an event, following
the scheme of Poisson, then the expression:
K«."- Pi + P2 + H P> __ ^
lim- = = po.
Digitized by
Google
114 LAW OF LABGE NUMBERS. [71
The essential part of Poisson's Theorem is contained in this
equation. When p = pi = pa = • • • p« we have a Bernoullian
series and obtain:
lim- = p,
«=« ^
which result we already derived above in a direct way.
71. Relation between Empirical Frequency Ratios and Mathe-
matical Probabilities. — In the above Umit, a indicates the total
number of lucky events while s is the total number of trials, the
quotient a -^ * then is nothing more than the empirical prob-
ability as defined in the preceding .paragraphs. Both the
Bernoullian and Poisson Theorems show that this empirical
probability approaches the postulated a priori probability, p,
(or the average probability po) as a limiting value.
In this way we have succeeded in extending the theory of
probability to other problems than the conventional kind involved
in the games of chance or drawings of balls from urns. We do
not need to limit our investigations to problems where we are
able to determine a priori the probability for the happening of
an event in a single trial, but limit ourselves to ^postulate the
existence of such an a priori probability.
A large number of trials or observations is made on a certain
event E. This event is now observed to have occurred a times
during the s total trials. To illustrate: An urn contains red
and white balls, the total number of ball? being unknown, a
single ball is drawn and its color noted. This ball is replaced
and the contents of the urn <S5 mixed. A second drawing is
made and the color of the drawn ball noted before the ball is put
back in the urn. Let this process be repeated 8 times, where s
is a large number, and furthermore let a be the number of red
balls which appeared during the s trials.
The quotient a -^ * we now call the empirical or a posteriori
probability for the observed event, in this particular case the
a posteriori probability for the drawing of a red ball. When
^ = 00 the Bernoullian Theorem tells us that the empirical
probability found in this manner and the postulated a priori
probability whose numerical value, however, was unknown
before the drawings took place, are identical as far as numerical
Digitized by VjOO^ IC
72] APPUCATION OP THE TCHBBYCHEPFIAN CRITERION. 115
magnitude is concerned. As we already observed in the intro-
ductory remarks to this chapter it is impossible to perform a
certain experiment an infinite number of times, and it is therefore
out of the question to determine the limiting and ideal value of
the posteriori probability, and we must satisfy ourselves with an
approximation by performing a finite number of trials, or let 8
be a finite number. The quotient a 4- * is then the empirical
approximate a posteriori probability. We know also that al-
though this quotient is an approximation of the postulated a
priori probability only, that by increasing 8 or what amounts
to the same thing, by xnaking a large number of trials, the dif-
ference between the approximate empirical probability ratio,
a-T- 8, and the a priori probability, p, becomes smaller as the
number of trials is increased. But how small is the difference?
Or how many times shall we repeat the trials (observations) so
that, for practical purposes, we may disregard this difference?
It does not suflSce to be satisfied with the fact that the difference
becomes proportionately smaller the greater we make the number
of trials and merely insist that in order to avoid large errors it is
only necessary to operate with very large numbers. Immediately
the question arises: What constitutes a large number? Is 100
a large number, or is 1,000, 10,000, 100,000 or even a million an
answer to this question? As long as this question remains
unanswered, it helps but little to poke upon the "law of large
numbers," a tendency which unfortunately is too manifest in
many statistical researches by amateur statisticians. As long
as a definition, much le^s than a numerical determination of the
range of "small numbers" is lacking, little stress ought to be
laid on such remarks based in the metaphorical terms of "small"
and "large" numbers.
72. Application of the Tchebycheffian Criterion. — It is readily
seen that even a rough quantitive determination of the difference
between the approximate a posteriori probability and the
postulated a priori probability based upon the mere vague state-
ment of "large numbers" is utterly impossible, and it remains
to be seen, therefore, if the theory of probability offers us a
criterion that might serve as a preliminary test for the above
difference. To restate our problem : If pis the posttdated a priori
Digitized by
Google
116 LAW OF LARGE NUMBEBS. [72
probabilily and a -i- s is the empirical probability (a posteriori) or
relative frequency of the event, E, what is the probability that the
difference, \ (a/s) — p \ does not exceed a previously assigned qiuintUyJ
In the mean error and the associated theorem of Tchebycheff
we have a simple and easily applied criterion to test this prob-
ability.
Tchebycheff's rule states that the probability, Pn of a devia-
tion of a variable from its probable value, not larger than X
times its mean error, is greater than 1 — (1/X^).
For
X= 3 Pt> 1- i = 0.888
X= 4 Pt> 1-^ = 0.937
X= 5 Pt> 1-2^ = 0.96.
This shows that a deviation from the expected or probable
value of the variable equal to 4 or 5 times the mean error possesses
a very small probability and such deviations are extremely rare.
Let us for example assume that the observed rate of mortality
in a certain population group is equal to .0200. Let furthermore
the number exposed to risk equal 10,000. The mean error is
(.02X.98\*
10000 ) ~ '0014. If the number of lives exposed to risk
was one million instead of 10,000, the mean error would be
1 000 00 ) ~ -00014. A deviation four times this latter quantity
is equal to .00056, and according to Tchebycheff's criterion the
probability for the nonroccurrence of a deviation above .00056
is greater than .937, or the probability of dying inside a year will
not be higher than .0206 or less than .0194. For an observation
series of 4,000,000 homogeneous elements we might by a similar
procedure expect to find a rate of mortality between 0.02 +
0.00028 or 0.02 — 0.00028. Thus we notice that the mean error
of the relative frequency numbers decreases as the number of
observations increases.
(^
Digitized by VjOOQ IC
CHAPTER X.
THE THEORY OF DISPERSION AND THE CRITERIA OF LEXIS
AND CHARLIER.
73. BemouUiani Poisson and Lexis Series. — ^In the previous
chapter we limited our discussion to single sets consisting of
s individual trials and found in the mean error and the criterion
of Tchebycheff a measure for the uncertainty with which the
relative frequency ratio a/s as well as the absolute frequency
a were affected. How will matters now turn out if, instead of a
single set, we make N sets of trials? As already mentioned in
paragraph 54, in general in N such sets we shall obtain N dif-
ferent valtles of a, denoting the absolute frequency of the event
represented by the sequence
(xi, a2, az, • • • a^f.
Our object is now to investigate whether the distribution of
the above values of a aroimd a certain norm is subject to some
simple mathematical law and if possible to find a measure for
such distributions.
In this connection it is of great importance whether the pos-
tulated a priori probabilities remain constant or not during the
N sample sets. Three cases are of special importance to us.^
1. The probability of the happening of the event remains
constant during all the N sets. The series as given by the ab-
solute frequencies in each set is known as a Bemoullian Series.
2. The same probability varies from trial to trial inside each
of N sample sets, the variations being the same from set to set*
The series as given by the absolute frequencies is in this case
known as a Poisson Series.
3. The probability remains constant in any one particular set
but varies from set to set. The absolute frequency series as
produced in this way is called a Leads Series,
The above definition of these three series may, perhaps, be
made clearer by a concrete urn scheme.
1 The terminology ii^ due to Charlier.
117
Digitized by
Google
118 THE THEORY OF DISPEBSION. [73
A. BemovUian Series. — ^9 balls are drawn one at a time from an
urn, containing black and white balls in constant proportion during
all drawings. Such drawings constitute a sample set. Let us in
this particular set have obtained say ai white and jSi black balls,
where ai+ Pi = s. We make N sets of drawings under the
same conditions, keeping a record of white balls drawn in each
set. The number sequence thus obtained,
ecu »2, as, • • • ajf.
is a Bernoullian Series.
B. Poisson Series. — s individual urns contain white and
black balls, the proportion of white to black varying from urn
to urn. A single ball is drawn from each urn and its color noted.
In this way we get ai white and jSi black balls constituting a set.
The balls thus drawn are replaced in their respective urns and a
second set of s drawings is performed as before, resulting in a2
white and 182 black balls. The number sequence,
OLU 0L2> otz, • • • ajfy
of white balls in N sets represents a Poisson Series.
C. Lexis Series. — s balls are drawn one at a time under the
same conditions as set No. 1 in the Bernoullian series. The ai
white and jSi black thus drawn constitute the first set. In the
second and following set^the composition of the urn is changed
from set to set. The number sequence representing the number
of white balls in the N respective sets:
otu 0L2y as, • • • ajf
is a Lexian Series. The scheme of drawings is the same as in
the Bernoullian Series except that the proportion of white to
Jblack balls varies from set to set.
74. The Mean and Dispersion. — Since we have, no a priori
Teasons for choosing any one particular value of the various a's
of the above sequences in preference to any other, we might give
•equal weight to each set and take the arithmetic mean as defined
by the formula:
,-. a i + a2 + as + * " ajv
M = -^ (I)
of the N values of a.
Digitized by VjOOQ IC
73] BERNOULLIANy POISSON AND LEXIS SERIES. 119
I-t will be unnecessary to enter into a detailed discussion of
the mean, which is a quantity used on numerous occasions in
every day life. We shall, however, define another important
function known as the dispersion (standard deviation). The
dispersion is denoted by the Greek letter, <r, and is defined by
the formula
We shall now attempt to find the expected value of the mean
and the dispersion in the three series. First of all take the
Bernoullian Series. Let the constant probability for success in
a single trial be po. We have then for the various expected values
or mathematical expectations of a:
Set No. 1 : e{ai) = spo
Set No. 2: e{a2) = ^o
Set No. N: ^ia^f) = spo
or:
e(ai) + e(a2) + h e(ajy) _ Sg(a^) _ Nspo _
N " N ^ N ^^""'^
which shows that the mean in a Bernoullian Series of N sample
sets is equal to the expected value of the absolute frequency in
a single set.
In regard to the dispersion we have for the various sets:
Set No. 1: e(ai — M)^ = €^{ai) = sp^o
Set No. 2: e{a2 - M)^ = ^{a^) = *pogo
Set No. N: e{ajf — M)^ = é{ajf) = sp^^o
Summing up and forming the mean we obtain for the expected
value of the dispersion in a Bernoullian Series, which we shall
denote by the symbol <r^:
^B = jy = jy = *Pogo.
Digitized by
Google
120 THB THBOBT OF DISPEBSION. [73
This result shows that the dispersion in a Bernoullian Series is
equal to the mean error, €, in a single set.
We now proceed to the Poisson Series. Let pi be the mathe-
matical probability of the happening of the event in the first
trial, pt be the probability in the second trial and so on for all
trials, and let us furthermore denote the means of the p's and
g's by:
Pi + Pt + Pz "^ + Pb
Po =
9o =
s
gl + g2 + g8 * " + g»
S
Applying a similar analysis as above we have:
Set No. 1: e(ai) = pi + P2 + • • • + P« = ^0
Set No. 2: e(a2) = pi + P2 + • • • + p« = «Po
Set No. N: e{aif) = pi + p« + • • • + p« = ^0
The actual summation of the above values of e{a) gives us the
following value of the mean in a Poisson Series:
Mp = *po.
Let us for a moment assume that all the drawings had been
performed with a constant probability, po. According to the
Bernoullian scheme we should then have:
Mb = *po.
An actual comparison shows that if ^ = Mp. This shows that
the same mean result is obtained if we draw s balls from the urns
TJi, U2, ' " Ua with their corresponding probabilities pi, p2, • • * p«
for drawing a white ball, as would be obtained if we drew all the s
balls from a single urn where the composition is such that the
ratio of the number of white to that of black balls is as po : qo,
where po and go are defined as above.
Let us now see how matters turn out in regard to the dispersion.
We have for the N sets:
Set No. 1: e(ai — M)^ = piqi + p^q^, + • • • = 2p^g^ = €^(ai)
Set No. 2: e{a2 — Mf = piqi + p^q^ + • • • = Sp^g, = e^a^)
Set No. N: e{a^ - My = piqi + p2q2 + • • • = Sp^g^ = e^a^)
Digitized by VjQOQIC
z
73] BERNOULLIANy POISSON AND LEXIS SERIES. 121
In § 70 we showed, however, that Sp,,?^ could be expressed
as follows:
A simple straightforward calculation gives us now for the
dispersion, <t^,
<Tp^ = <rB^ - S(p, - po)^
In the corresponding Bernoullian Series with constant proba-
bility, po> the dispersion is equal to *pogo, which shows that the
dispersion in a Poisson Series is less than the corresponding
dispersion of the Bernoullian Series.
We finally come to the mean and the dispersion in the Lexian
Series which we shall denote by if ^ and cr^ respectively. Let us
furthermore define the two quantities po and go as follows:
Pi + P2 + VVn
Po= -^— ,
^ _ gl + g2 + h gjy
?o- -^ .
A computation along similar lines as above gives us first for
the mean, Mj^:
Set No. 1 : e(ai) = spi
Set No. 2: eiat) ^ apt
• • .
Set No. N: eia^) = spjf
Thus we have:
njr ^^M ^^Vv ^[Pi +'P2 + '-Pn\
Ml = -jT ^~N~^ N ^ ^^•
For the dispersion we have the following expectations:
Set No. 1 : e{spQ — ai)^
Set No. 2: e(8po — atY
...
Set No. N: e{spo — a^^y
The expected value in the i^h set is
e(spo - a^)2 =2(^0 - a^)V.r(a),
Digitized by
Google
122 THB THEORY OF DISPERSION. [74
where <p^{a) is the general term in the probability binomial:
(Py + q^y = 1. An analysis along similar lines as in § 65
gives us now:
e(spo — a^y =Vpo^ — 2s^pop, + ^p^j;\- sp^q^
= sp^qy + «2(p^ - po)*
as the expected value of the square of the difference between the
mean and the absolute frequency in the v\h set. For all N
sets we then have
<^L = jy + j^^Kpy - Por*
We have, however, the following identity:
Hp^qy = Npfiqo - S(p^ - po)\
and hence
^L'=<rB'+^^^(Py-po)'.
74a. Mean or Average Deviation. — Of quite another character
than the standard deviation or dispersion is the so-called mean
or average deviation, i^, defined by means of the following
relation:
^■" N
where | or^ — 3f | means the absolute difference between m, and
M. We shall now proceed to determine the expected value of
1^ on the assumption that the observed data follow the Bepnoullian
Law. The mean in a BernouUian series with constant prob-
ability po we found before to be equal to spo w^ich was the
expected value of a in a single sample set of 8 trials. The
expected value of the absolute difference in the yth set is therefore:
e\ay — 8po\ = X\ay — spo \ <Py{a),
where as usual <Pp(a) is the binomial probability function.
The deviations from spo are partly positive and partly negative.
We proved, however, before that
e(cxy — spo) = S(a^ — «Po)^,.(a) = 0.
Digitized by VjQOSlC
74] MEAN OR AVERAGE DEVIATION. 123
Hence it is readily seen that the algebraic sum of the positive
deviations cancel the algebraic sum of the corresponding nega-
tive deviations so that e\ay — spo \ equals twice the sum of
the positive deviations. Positive deviations occur for values
of a greater than 8po, i. e., for all values which a may assume
from s to spo in the binomial expansion: (po + 9o)*.^ Hence we
have (omitting subscripts) :
e\a - 8p\ ^ 2j^(a - 8p) y^J p-g^
'{P('a)'^-"'^0'^^}-
The second of these sums represents the following function of
pand q
By partial differentiation in respect to p and by following
multiplication by p we have:
p| =»P'+(*-l) (i)p-'9+(*-2) (l)p^9'+ '"
Hence we may write:
e\a-ifp\ = 2|p^-«Ef j^.
Furthermore f(p, q) is a homogenous function in respect to p
and q of the *th order. We may then apply the following well
known Eulerian Theorem from the differential calculus: If
f(p, q) is homogenous and has continuous first partial derivatives
then
Using this relation we may write:
e|.-^l = 2{.|-^}=2p,{|-|}.
^ Spo is taken to the nearest integer.
Digitized by VjOOQ IC
124 THE THBORY OF DISPEB8I0N. [74
The partial derivatives of /(p; q) with respect to p and q are
of the form:
I - ^ + .(. - 1,,-+ . . . + ^^I^^:^±i)^^.
|-^.+,(,-„^+...+ '(^-^'):;-^_+y ^.
Hence we have:
We proved, however, in § 63 that the expression inside the
bracket may be written approximately as follows:
1
r^ =
^2Trspq
This gives us finally (again using the subscripts):
e I a^ - *po I = 2sp(iqoT„, = \""ir^
as the expected value of the absolute deviation in the i^h sample
set. This same relation evidently holds true for any other of
the N sample sets, which finally gives us the following result for i^ :
The dispersion in a Bernoullian series we found before to be
of the form:
(rs= ^spaqo.
Hence we have the following relation between the dispersion
and the mean deviation:
-^'-1-
^^ = ^- ^ = 1.2533 1^.
75. The Lezian Ratio and the Charlier Coefficient of Dis-
turbancy. — ^The results given in the last few paragraphs may be
embodied under the following captions.
Digitized by
Google
r-'
75] THE LEXIAN RATIO. 125
1. The mean in a Poisson and Lexis Series is the sarae as the
mean in a BemouUian Series with constant probability of po
in a single trial, where po is defined as above.
2. The dispersion in a Poisson Series is less than the correspond-
ing dispersion in a BemouUian Series.
3. The dispersion in a Lexis Series is greater than the dispersion
in a BemouUian Series.
The mean and the dispersion of the BemouUian Series occupy-
in this connection a central position and may be used as a standard
of comparison with other series. This is the method adopted by
Lexis in investigating certain statistical series, and we shall re-
turn to it in the following chapter. Lexis determines first
in a direct manner the dispersion as defined by formula (II)
from the statistical data as given by the number sequence a.
This process is known as the direct process (by Lexis called a
physical process) and gives a certain dispersion, a. After this
the dispersion is computed by an indirect (combinatorial)
process under the assumption that the series follows the Ber-
noullian distribution. The ratio^ a : <rp, which Charlier calls
the Lexian Ratio and denotes by the symbol, L, may now give
us an idea about the real nature of the statistical series as
represented by the number sequence.
When i = 1, the series is by Lexis called a normal series.
When Z > 1, the series is called hypemormal.
When i < 1, the series is a subnormal series.
It is easily seen from the respective formulas that the Poisson
Series are subnormal series whereas the Lexian Series are hyper-
normal. The great majority of statistical series are — as we
shall have occasion to see in the following chapter — of a hyper-
normal kind and correspond thus to the Lexian Series.
In § 74 we foimd the dispersion in the Lexis series as
»r^ = <Ti + (a» - aW,
where
2(P. - Po)2
<r«* =
JV
The quantity, o-p, is the natiu-al measure of the variations in
the chances from the mean or normal probability, po* It is
Digitized by
Google
126 THB THEORY OF DISPEBSION. [75
however, dependent on the absolute values of these chances, so
that if all chances are changed in the same proportion, o-p is
also changed in the same proportion. Another drawback which
influences the Lexian Ratio is the variations of the number s
in each sample set. In order to overcome this difficulty Charlier
divides the above quantity o-p by po. Assuming that the vari-
ations in the individual probabilities within each set are of no
perceptible influence on the dispersion, we have from the Lexian
dispersion:
Neglecting 8 in comparison with s^ and remembering that
Mb = 9po, we have as an approximation:
<rp_ Vc£-<r/ ^
Po Ms ^'
Charlier calls the quantity lOOp the coefficient of disturbancy of
the statistical series. It is readily seen that the Charlier coef-
ficient is zero in normal series. For hypernormal series it is a
positive real quantity whereas for subnormal series p is imaginary.
Digitized by
Google
CHAPTER XI.
APPLICATION TO. GAMES OF CHANCE AND STATISTICAL
PROBLEMS.
76. Correlate between Theory and Practice. — ^In the theo-
retical analysis just completed we treated the fundamental ele-
mentary functions in the theory of probabilities, the probability
function, the expected or probable value of a variable quantity,
the mean error, the dispersion and the coeflScient of disturbancy.
The formulas thus derived were founded upon certain hypo-
thetical axioms, which formed the basis of a mathematical a
priori probability as defined by Laplace. As far as the purely
abstract mathematical analysis is concerned it matters but little
if the hypotheses are physically true or not, that is to say, if
they agree with physical facts in the universe as it is known to
us. A mathematical analysis may be made on the basis of
widely divergent hypotheses, a fact which is clearly shown in
the Euclidean and Non-Euclidean geometries. It is, however,
quite a different matter when we wish to apply our theory to
actual phenomena (physical observed events) as it is evident
that a correlation between hypothesis and actual facts follows
by no means a priori. It is, of course, true that the different
hypotheses in the theory of probabiUties are derived to greater
or less extent from outside sense data. Such sense data, however,
give us only the effect and no clue whatsoever to the relation
between cause and effect. In the application of our theory every
hypothesis — or rather the results derived from such hypothesis
— must be verified by actual experience. Before such a veri-
fication is made, we advise the reader to be sceptical and not
trust too much in the authority of others but follow the sound
advice of Chrystal: "In mathematics let no man over-persuade
you. Another man's authority is not your reason." We can so
much more encourage an attitude of scepticism in view of the
fact that even among the leading mathematicians of the present
time there exists no uniform opinion as to the truth of the
axioms underlying the theory of probabilities.
127
Digitized by
Google
128 APPLICATION TO GAMES OP CHANCE. 176
77. Homograde and Heterograde Series. Technical Tenns.
' — ^Whenever a coinmon Gharacteristic or attribute of several
groups of observed individual objects or events allows a purely
quantitative determination^ it may be made the subject of a
mathematical analysis and in such cases we are often able to
make excellent use of the theory of probabilities. Such quan-
titative measurements may be divided into various domains
of classification. Traces of such classification are found in almost
every treatise on mathematical statistics but a uniform system
nomenclature is unfortunately lacking among the various
statisticians and any one reading the modern literature on mathe-
matical statistics notices often various inconsistencies of the
different authors. Mr. G. Udny Yule in his excellent treatise
"Theory of Statistics" classifies the statistical series into "sta-
tistics of attributes'' and "statistics of variables." Apart from
the fact that Mr. Yule's statistics of variables also is a statistics
of attributes — ^although of different grades — ^the author appar-
ently ignores the criterion of Lexis and the associated criterion
of Charlier. The German writers use the terms "stetige und
unstetige Kollektivgegenstand" (continuous and discontinuous
collective objects), which were originally introduced by Fechner.
Other writers, such as Johannsen of Denmark and Davenport
of America, use still other terms. After having made a com-
parison of the various systems of classification I have in the
following decided to adhere to the system of Charlier wherein
the observed statistical series are classified as homograde and
heterograde.
If the individuals all possess the same character or attribute
in the same grade (intensity) — or if we disregard the different
grades of the attributes — such individuals are called homograde,
and the statistical series thus formed is a homograde series. If
on the other hand we take into consideration the different
varying grades of the attributes observed or measured and form
the series accordingly we obtain a heterograde series. As examples
of homograde series we may mention the observed recorded
'series of coin tossing, card drawings in reference to a specified
event, number of births or deaths in a population group, etc.
A coin when tossed will either show head or tail, a person will
Digitized by
Google
77] H0M0ORA.DE AND HETBBOORADB SBBIES. 129
dther be dead or alive. There are no intermediate degrees as
for instance that of a half dead person. In all such series the
dividing line between the occurrence of _the event (attribute) E
and the occurrence of the opposite event E is distinct and suggests
itself a priori and there is no doubt as to the classification of the
observed event.
The original record of observation of a homograde series — also
known as the primary list — ^is simply a record of the presence or
non-presence of a specified attribute of the individuals belonging
to the group under observation and is of the following form:
Pbimabt
List
OF Homograde iNDivmnATiS.
Attribute.
Symbol for the IndlTidoAl
ProMnt {E).
Non-present {E)
1
1
1
1
1
In this scheme the individuals 7i, Ii and I^ possess the attribute
E while the individuals J2 and 1% do not have this attribute.
In observing the presence of a specified attribute in a group of
individual objects we meet, however, frequently series of quite
another nature than the simple homograde series. When in-
vestigating the different measures of heights of persons inside a
certain population group no simple dichotomous (t. c, cutting
in two) division in two opposite and mutually exclusive groups
suggests itself a priori. It is of course true that we might divide
the total population under observation into two subsidiary groups
of tall individuals and short individuals. But the question then
immediately arises. What constitutes a short or a tall person?
The answer must necessarily be arbitrary. Persons above the
height of 170 cm. may be classed as tall while persons falling
short of such measure may be classed as short persons, and we
might in this way form a primary homograde table of the form
as given above. There is no logical reason, however, to choose
the quantity 170 cm. as the dividing line and comparatively
10
Digitized by
Google
130 APPLICATION TO GAMES OP CHANCE. [78
little value would result from such a classification. It is evident
that all persons belonging to groups of tails or shorts are not iden-
tical as to the particular attribute in question. The height is
merely a characteristic which varies with each individual and no
two individuals have mathematically speaking the same height.
If we take into consideration the different grades of height among
the individuals and arrange the primary table accordingly we
obtain a heterograde series of observations. The general form
of the primary table of such series is:
Primary List of Heterograde Individuals.
Sjmbol for the Indiridaal. Grade of Attribute.
/i Xi
It X%
It Xt
li Xi
/« Xf
In Xn
Here the quantities Xi, X2, • • • Xn give the measures (in kilo-
gram, liter, meter, etc.) of the characteristic in question.^
As examples of heterograde series we may mention the lengths,
volumes or weights of animals, plants or inorganic objects;
astronomical observations as to the brightness of celestial objects;
meteorological records of rainfall, temperature or barometer
heights ; the frequency of deaths among policyholders as to
attained age in an assurance company; duration of sickness or
disablement, etc.
The investigation of heterograde series is a problem of which
we shall treat later under the theory of errors or frequency curves.
The homograde series may, however, be explained fully by means
of the Bernoullian, Poisson and Lexian Series as founded on the
mathematical theory of probabilities in the previous chapters.
78. Computation of the Mean and the Dispersion in Practice.
— It would be superfluous to enter into a detailed demonstration
of the practical calculation of either the mean or the dispersion
1 It is to be noted that in the homograde series the primary list is given by-
abstract numbers while the heterograde series consists of concrete numbers.
Digitized by
Google
78] COMPUTATION OF THE MEAN. 131
were it not for the fact that this calculation is performed with a
lot of unnecessary and useless labor by the untrained student and
even by many professional statisticians. By the ordinary school
method the number zero is chosen as the starting point and all
the variables are expressed in their absolute magnitudes, i. e.,
their distance from 0. In this way one often encounters mul-
tiplication and addition of large numbers. The Danish biologist
and statistician, W. Johannsen, has illustrated the futility of this
method in the following example taken from his treatise " Forelæs-
ninger over Læren om Arvelighed" (Copenhagen, 1905).^ Dr.
Petersen, the director of the Danish Biological Station, counted
the tail fin rays of 703 flounders (Pleuronectes) caught around the
neighborhood of the Skaw. The observations follow:
Number of rays: 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
No. of flounders: 6 2 13 23 58 96 134 127 111 74 37 16 4 2 1
The ordinary way of computing the mean would be as follows:
[5 X 47 + 2 X 48 + 13 X 49 + • • • + 1 X 61] -5- 703,
where 703 is the total number of individuals under observation*
In Chapter X we gave the following formula for the mean:
M = -j^ . (1)
This formula may evidently be written as follows:
mi — Mo + 1712 — Mo + mz — Mo + • • • + mj^ — Jf o
M =
N
(2)
In this expression Mo, which Charlier calls the provisional mean,
is an arbitrarily chosen number. To show how the introduction
of this quantity actually shortens the calculation of the mean,
we return to the above quoted series of observations of tail fia
rays of flounders.
1 German edition "Elemente der exakten Erblichkeitslehre" (Jena, 1913),.
page 11.
Digitized by
Google
132 appucahon to games of chance. [78
NuMBBB ov Rats (z) nv 703 Flottndebs Accobdino to Obsbbvationb or
Db. Pxtebsbn.
N - 2F(x) - 703, Mo - 63.
Fnquenoj
9.
--PT«)
9 -
JV«.
(x^Mk)F{z).
47
6
-6
- 30
48
2
-6
- 10
49
13
-4
- 62
60
23
-3
- 69
61
68
-2
-116
62
96
-1
- 96
63
134
+0
+
64
127
+1
+127
66
111
+2
+222
66
74
+3
+222
67
37
+4
+148
68
16
+5
+ 80
69
4
+6
+ 24
60
2
+7
+ 14
61
1
+8
+ 8
Sum » Z
703
-373 +846
have now:
6 = (845 - 373) + 703 = 0.67, M= Mo+b^ 53.67.
The method is quite simple and needs hardly any explanation.
.'From a cursory examination of the material we notice that the
mean is situated in the neighborhood of the series consisting of
53 rays. We choose therefore the provisional mean, Mq, as 53.
We next form the algebraic differences of a — Mq. These dif-
ferences are then multiplied by F(x). The algebraic sum of
these products divided by iV = ^F(x) gives us the value of b,
which quantity added to Mq gives the value of the mean, 3f.
To show a slightly modified form of the method we take the
following observations of coal-mine accidents in Belgium, covering
the period 1901-1910, from "Annales des Mines de Belgique.'*
These data I have reduced to a stationary population group of
140,000 mine workers. In other words the quantity s as defined
in § 83 is equal to 140,000.
Digitized by
Google
78] COMPUTATION OP THE MEAN. 133
NuMBBB (m) OF Pbbsons Killed in Coal Mine Accidents in Belgium,
1901-
-1910.
8
= 140,000,
N
= 10, Mo = 140.
Year.
m.
m-Jfo.
(m-Jfo)«.
1901
164
+24
576
1902
150
+10
100
1903
160
+20
400
1904
130
-10
100
1905
127
-13
169
1906
133
- 7
49
1907
144
+ 4
16
1908
150
+10
100
1909
133
- 7
49
1910
133
- 7
49
Sum = Z
-44 +68
1608
Hence
6 = (68 - 44) + 10 = 2.4, Jf = 140 + 2.4 = 142.4.
In this example probably it would have been easier to have formed
the sum 2m^ directly and then obtained the mean by division
by 10. The actual formation of the algebraic sums of rriy — Mq
however, greatly facilitates the calculation of the dispersion, a;
to which we now shall turn our attention.
The formula for the dispersion
^=.?(^^=:Æ'(,= 1,2,3,. ..JV) (3)
may evidently be written as follows:
N ^'
where b as usual means M — Mq, Mq being the provisional mean.
For Bel^an coal mine accidents we thus obtain from the above
data*
a^ = (1608 ^ 10) - 5.76 = 155.04. .
Where the number of observed individuals is very large an
arrangement as that given above for the Belgian statistics becomes
too bulky and it is therefore customary to group the observations
in classes as for instance in the example of Dr. Johannsen. The
dispersion is then computed according to the following elegant
Digitized by VjOOQIC
134
APPLICATION TO GAMES OF CHANCE.
[78
method due to Charlier from whose brochure "Grunddragen af
den matematiska Statistiken" ("Rudiments of Mathematical
Statistics '0 I take the following example:
NuMBBB OF BoT8 (m) PEB 500 ChiIjDREn Born m 24 Provinces of Sweden
DURING Each Month m 1883 and 1890.
«-500,
N - 576,
Mo - 257,
u> = 6.
<
[nan.
FreqaenoT
Limitt
Namber.
m.
■■X.
-i^x).
«i^«).
$^F(x).
(x+l)«ifX«)
200-204
-11
1
- 11
+ 121
100
205-209
-10
210-214
- 9
215-219
- 8
1
- 8
+ 64
49
220-224
- 7
2
- 14
+ 98
72
225-229
- 6
5
- 30
+ 180
125
231-234
- 6
13
- 65
+ 325
208
235-239
- 4
18
- 72
+ 288
162
240-244
- 3
47
-141
+ 423
188
245-249
- 2
60
-120
+ 240
60
250-254
- 1
81
- 81
+ 81
255-259
108
108
260-264
+ 1
91
+ 91
+ 91
364
265-269
+ 2
60
+120
+ 240
540
270-274
+ 3
44
+132
+ 396
704
275-279
+ 4
22
+ 88
+ 352
550
280-284
+ 5
16
+ 80
+ 400
576
285-289
+ 6
6
+ 36
+ 216
294
290-294
+ 7
295-299
+ 8
300-304
+ 9
1
+ 9
+ 81
100
Sum^Z
576
+ 14
+3596
+4200
The class width interval in the above scheme was chosen as 5.
The observed frequencies are given in column 3. We thus find
that the greatest frequency of 108 falls in the class interval
255-259. Choosing this class interval as the origin we designate
the other class intervals with their proper positive and negative
numbers as shown in column 2. The provisional mean, Mq,
is taken as the center of class 0, or Mq = 257. In this way the
class interval w = 5 is taken as the unit.
The whole calculation is very simple. We first of all form the
product X X F{x). The sum of these products divided with
576 = N gives the distance — b — from the provisional mean to
the arithmetic mean, expressed in units of the class interval, w.
Digitized by
Google
78] COMPUTATION OF THE MEAN. 135
We have thus:
6 = t(? X 14 -r- 576 = + 0.0243W = + 0.122,
or
Jlf = 257 + 6 = 257.12.
The formula for the dispersion takes the form
.=^[^)_^],
where 6 is expressed in units of the class interval. The table gives
us
XF(x)x^ = 3596 or
a^ = w^[i596 -5- 576 - (0.024)2] = v}^.242,
a= wX 2.498 = 12.49.
Charlier now checks the results by means of the following relation:
S(a: + \yF{x) = "LT^Fix) + 2Si^(a;) + 2F(a:). x
A M
For the above example we have: /\
l^Q?F{x) = + 3,596
2^xF{x) = + 28
-LFix) = + 576
Sum = + 4,200 = S(a: + \YF{x),
which proves the accuracy of the calculation.
The full elegance of the Charlier self checldng scheme is shown
at a later stage under the calculation of the parameters of fre-
quency curves. In the meantime the student may test the ad-
vantage of the provisional mean by trying to compute the mean
and the dispersion by the conventional school method. A
direct computation by this method would in the last example
take about a whole day's labor.
Before we proceed to apply the formulas previously demon-
strated, we wish to call the attention of the reader to the following
important properties of the mean and the dispersion:
1. The algebraic sum of the deviations from the mean — i. e.,
S(m^ — M) — ^is zero. This follows inmiediately from formula
(2) of §78. We have:
Digitized by
Google
136 APPLICATION TO GAMES OF CHANCE. [79
where M^, the provisional mean^ is an arbitrarily chosen number
and h = 2(m^ — 3fo) -r- N, IS Mo =^ M we have evidently
6 = 0, which proves the statement.
2. The dispersion (standard deviation) is the least possible
root-mean-square deviation, i. e., the root-mean-square deviation
is a minimum, when the deviations are measured from the mean.
We have (see formula (4)) :
o" Jf = j^ i^,
from which the proposition follows a fortiori.
79. Westergaard's Experiments. — ^The Danish statistician,
Harald Westergaard, in his "Statistikens Teori i Grundrids"
gives the following results of 10,000 observations divided into
100 equal sample sets of drawings of balls from a bag containing
an equal number of red and white balls (the ball was returned
to the bag after each drawing):
White: 33 34 39 40 41 42 43 44 46 46 47 48 49 60 61 62 63 64
Frequency: 01 1222334666 11 96 10 48
White: 66 66 67 68 69 60 61 62 63
Frequency: 364400111.
The elements as resulting from Westergaard's drawings clearly
represent a Bernoullian Series where the number of comparison
8 is equal to 100. Arranging the data in classes — ^taking 3 as
the class interval — ^the computation of the mean and the dis-
persion is easily performed by means of the Charlier self checking
scheme.
Bbbnouluan Sbbies. Number of WmTE Balls in 100 Drawings
(Westergaard).
«-100,
N - 100, Mo = 49, w
-3.
m.
s.
IXx). xF(xh
$^Fi*).
(«+l)«iH[«).
33-35
-6
1 - 6
26
16
3&^8
-4
39-41
-3
6 -15
46
20
42-44
-2
8 -16
32
8
46-47
-1
16 -16
16
48-60
26
26
61-63
+1
19 +19
19
76
64-66
+2
16 +32
64
144
67-69
+3
8 +24
72
128
60-62
+4
2 +8
32
60
63-66
+5
1 + 6
26
36
Sum
100 (-51+88)
329
603
t
Digitized by v
^ooqIc
80 J chablieb's experimentb. 137
Control Check.
-LT^Fix) = 329
2^xFix) = 74
2F(a;) = 100
Sum = 503 = S(a; + \fF{x)
b = w{88 - 51) : 100 = t(? X 0.37 = 1.11,
or M= Mo+b^ 50.11,
a^=w^[Z29^00 - 62]i = w^{Z.29 - 0.137) = 28.377,
or 0- = 5.33.
Giving due allowance for the respective mean errors of the mean
and the deviation we have finally:*
M = 50.11 =h 0.536, a = 5.33 ± 0.378.
We shall now compare these values with the corresponding the-
oretical values of the Bernoullian series. The a priori probabil-
ities of drawing red and white are in this example p = g = J.
Hence we have as the theoretical values for the mean and the
dispersion:
Jf B = 100 X i = 50, (Ts = VlOO X i X i = 5.
A comparison between the observed and the theoretical ideal
values — ^taking into account the proper mean errors — shows a
very close agreement as far as the dispersion is concerned while
the difference in the mean is about ^ of the mean error. A
computation of the Lexian Ratio and the Charlier Coefficient of
Disturbancy yields the following results:
L = 1.072; lOOp = 3.68.
Taking into account the proper mean errors due entirely to
ihe fliictiuUion of sampling we find, however, that our theoretical
results and formulas of the previous chapters have been verified
in an absolutely satisfactory manner.
80. Charlier's Experiments. — In the above mentioned bro-
chure, "Grunddragen," Charlier gives the results of a long series
of card drawings illustrating the Bernoullian, the Poisson and
the Lexian Series. As an example showing the frequency dis-
> h is expressed in units of w.
'R>r mean errors of Af -and <r see Addenda.
Digitized by VjOOQ IC
138 APPUCATION TO GAMES OF CHANCE. [80
tribution in a Bernoullian Series Charlier made 10,000 individual
drawings (with replacements) from an ordinary whist deck and
recorded the number of black and red cards drawn in this manner.
Arranging the drawings in sample sets of 10 individual drawings,
M. Charlier gives the following table:
Bbrnottllian Sebibs. Numbbb (m) of Black Cabos in Sabiple Sets of 10.
a -10, JNT- 1,000, Mo -6,
U>-1.
m.
«.
J\x). »F{x),
ufiFix).
(x+l)«i^«).
-5
3 - 16
+ 76
+ 48
1
-4
10 - 40
+ 160
+ 90
2
-3
43 -129
+ 387
+ 172
3
-2
116 -232
+ 464
+ 116
4
-1
221 -221
+ 221
5
247
+ 247
6
+1
202 +202
+ 202
+ 808
7
+2
116 +230
+ 460
+1,035
8
+3
34 +102
+ 306
+ 544
9
+4
9 +36
+ 144
+ 225
10
+5
Sum:
1,000 - 67
+2,419
+3,285
Control Check.
X^xFix) = + 2,419
21^xF(x) = - 134
IiF(x) = + 1,000
Sum = + 3,285 = X(x + iyF(x)
From the above values we obtain:
6 = - 67 : 1,000 = - 0.67; a^ = 2,419 : 1,000 - 6« = 2.415.
Making due allowance for mean errors we have thus:
3f = 5 - 0.067 = 4.933 =b 0.050; a = 1.554 =h 0.035.
For the theoretical mean and dispersion we obtain the following
values: (p = g = ^)
Ms =5; (T^ = 1.581,
which gives the following values for the Lexian Ratio and the
Charlier coefficient:
L = .983, lOOp is imaginary.
These results would indicate a slightly subnormal series. Tak-
ing into account the fluctuations due to sampling and for which the
Digitized by
Google
80] chabluir's experimentb. 139
mean error serves as a measure the restdts become normal and
serve again as a verification of the theory.
Poisson Series. — ^As an illustration of the frequency distribution
in a Poisson Series Charlier made the following experiment:
From an ordinary whist deck was drawn a single card and the
color noted. Before the second drawing a spade was eliminated
from the deck and replaced by a heart from another deck of
cards, so that the deck then contained 12 spades, 13 clubs, 13
diamonds and 14 hearts; from this deck another card was drawn
and the color noted. Then another spade was eliminated and a
heart substituted. From this deck, containing 11 spades, 13
clubs, 13 diamonds and; 15 hearts, a card was again drawn. The
drawings were in this manner continued until all the spades were
replaced by hearts. The same operation was applied to the
clubs, which were replaced by diamonds. After 27 drawings
the deck contained only red cards. Altogether 100 sample sets
of 27 drawings were made with the following results:
Poisson Series.
Number (m) of Black Cards
IN Sample
Sets of 2
8 = 27,
N ^ 100,
Mo = 7, w = 1.
m. X.
Fix).
xF{x).
ai^Fix).
(»+i)«i^«).
Control Choc
3 -4
2
- 8
+ 32
18
4 -3
6
-18
+ 54
24
+378
6 -2
14
-28
+ 56
14
+ 32
6 -1
14
-14
+ 14
+100
7
22
22
8 +1
17
+17
+ 17
68
+610
9 +2
14
+28
+ 56
126
10 +3
8
+24
+ 72
128
11 +4
1
+ 4
+ 16
25
12 +5
1
+ 6
+ 25
36
13 +6
1
+ 6
+ 36
49
Sum: 100 +16 +378 510
The calculation of the mean and the dispersion with their
respective mean errors yields the following result:
6 = + 0.16, M = r.l6 =b 0.211,
^ = 3.78 - (0.16)2 = 3.754, tr = 1.937 =h 0.149.
The theoretical Poisson values according to the formulas of
§67 are:
Mp = 6.75, ap = 2.111.
If we now take the arithmetic mean of the various proba-
Digitized by
Google
140 APPLICATION TO GAMES OF CHANCE. [80
bilities of drawing a black card we find that po "= i- If all the
drawings had been performed with a constant probability we
should according to the Bernoullian scheme have:
3fB=27Xi = 6.75, (T^ = V27 + i X i = 2.25.
These results verify the formulas as obtained under the discussion
of the Poisson Series. {Mp = M^, ap < <r^.)
Lexian Series. — In testing the Lexian Series Charlier first
took 10 samples of 10 individual drawings in each sample from
an ordinary whist deck. The number of black cards thus
drawn was recorded. After this, 10 samples of the same mag-
nitude were taken from a deck containing 25 black and 27 red
cards; and then 10 samples from a deck with 24 black and 28 red
cards. Of the total 270 samples (until the deck contains only
red cards) Charlier gives the first 100 which gave the following
result:
Lexian Sebies. Numbbb (m) of Black Cabob in 10 Dbawinos.
5 = 10, JV = 100, Afo = 4.
». s. lU). xP{x) a^F{x). (« + l)«i't«). Control Check.
1
-3
4
-12
+ 36
+ 16
2
-2
9
-18
+ 36
+ 9
3
-1
19
-19
+ 19
+
4
21
+ 21
5
+1
23
+23
+ 23
+ 92
6
+2 .
10
+20
+ 40
.+ 90
+294
7
+3
12
+36
+108
+192
+ 76
8
+4
2
+ 8
+ 32
+ 60
+100
Sum: 100 +38 +294 +470 +470
The final computations (with mean errors) give:
6 = + 0.38, M = 4.38 =h 0.167,
<j2 = 294 : 100 - fc2 = + 2.796, (r = + 1.672 =h 0.118.
The mean probability in all trials was:
po = 2L50 : 52 = 0.4,135, or M^ = spo = 4.135,
(Ts = ^spoQo = 1.557.
A calculation of the mean and the dispersion according to the
formulas under the Lexian Series (see § 74) gives according to
Charlier:
Mi = 4.135, (Ti = 1.643.
Digitized by
Google
81] BXPEBIMENTS BY BONYNOE AND FISHEB. 141
This shows that the dispersion in a Lexian Series is greater than
the corresponding Bernoullian dispersion. The Lexian Ratio:
i = (Ti : <r^ has the value 1.06. The series according to the
terminology of Lexis has a hypernormal dispersion, although
a very small one. Charlier in "Grunddragen" (§ 30) says that
when arranging the material in 27 samples, each sample con-
taining 100 single trials, the Lexian Ratio has the value i= 3.82,
indicating a greater hypernormal dispersion than in the smaller
samples.
81. Experiments by Bonynge and Fisher.— As an additional
verification of the Bernoidlian, Poisson and Lexian Series my
co-editor, Mr. Bonynge, and myself have repeated the experi-
ments of Westergaard and Charlier in a slightly modified form.
BemouUian Series, — In 20 sample sets, each set containing
500 individual drawings, from an ordinary whist deck, I counted
the number of diamonds drawn in each sample. My records gave
the following scheme:
Bbbnottllian Sebdcs. NuifBBB OF Diamonds (m) in 20 Samflb Sets of
600 Drawings.
«-600, N =
= 20, Mo -126.
m.
m '
--»ft.
(m-lft)«.
123
- 2
4
143
+ 18
324
124
- 1
1
133
+ 8
64
142
+ 17
289
130
+ 5
26
117
- 8
64
122
- 3
9
132
+ 7
49
109
-16
266
130
+ 5
26
139
+ 14
196
138
+ 13
169
129
+ 4
16
136
+ 11
121
121
- 4
16
136
+ 10
100
124
- 1
1
135
+ 10
100
116
- 9
81
Sum: -44 +122 1,910
The results with their respective mean errors are as follows:
M = 128.9 =h 2.01, a = 8.962 =h 1.416
Digitized by
Google
142 APPLICATION TO GAMES OF CHANCE. [81
The theoretical Bernoullian mean and the dispersion have the
values:
Mj, = 125, cr, = ^ = V500 X i X f = 9.682,
where p = i denotes the a priori probability of drawing a
diamond.
Again I counted the number of aces (irrespective of color)
which appeared in 100 sample sets of 100 individual drawings
from the same deck of cards. The records arranged in classes
gave the following scheme:
Number of Acbs (m) in 100 Sample Sets of 100 iNDiymuAL Drawings.
« = 100, i\r =- 100, Mo = 8, w = 1.
m, X, J\x). xJFlx), a^Fix). (x+1)«^«). Contiol Cheek
2
-6
1
- 6
36
25
3
-6
8
-40
200
128
4
-4
8
-32
128
72
5
-3
7
-21
53
28
6
-2
9
-18
36
9
7
-1
21
-21
21
8
13
13
9
+1
15
+15
15
60
10
+2
3
+ 6
12
27
11
+3
9
+27
81
144
+811
12
+4
1
+ 4
16
25
-110
13
+5
2
+10
50
72
+100
14
+6
2
+12
72
98
15
+7
+
801
16
+8
+
•
17
+9
1
+ 9
81
100
Sum: 100 -55 811 801
6 = - 55 : 100 = - 0.55,
M ^ Mo+b= 7.45 ± 0.279 (with mean error),
or
<r = 2.794 =b 0.198 (with mean error).
The theoretical Bernoullian values are:
if^ = 100 X ^ = 7.69, as = VlOO X tV X If = 2.663.
A comparison between the empirical and the theoretical a priori
values exhibits a close correspondence.
Digitized by
Google
81] EXPERIMENTS BY BONYNGE AND FISHER. 143
Poisson Series. — ^As an illustration of the Poisson Series
Mr. Bonynge made the following experiment. A sample set of
20 single drawings of balls from an urn (one ball being drawn at a
time) was made under the following conditions:
In drawing No. 1 the um contained 20 white and 20 black balls.
it 2 "
tt
tt
21
It
" 19
" 3 "
u
tt
22
tt
" 18
« it tt 20 '^ *^ ** 39 '' '' 1 '' **
Altogether Bonynge took 500 sample sets which arranged in
classes give the following scheme:
Poisson Series. Number of Black Balls (m) in 500 Sample Sets of 20
Individual Drawings (Bonynge).
« = 20, i\r = 500, Mo = 5.
m.
s.
I\xh
xJPXx),
a^FXx).
(«+i)«iJt»).
-6
2
- 10
50
32
1
-4
9
- 36
144
81
2
-3
35
-105
315
140
3
-2
52
-104
208
52
4
-1
86
- 86
86
5
109
109
6
+ 1
85
+ 85
85
340
7
+ 2
69
+ 138
276
621
8
+ 3
30
+ 90
270
480
9
+ 4
16
+ 64
256
400
10
+ 6
6
+ 30
150
216
11
+ 6
1
+ 6
36
49
Sum: 2= 500 +72 1876 2520
Hence we have:
b = 0.144, M = 5.144, a^ = 3.732, a = 1.932.
The theoretical Poisson values are:
Mp = 5.25, <Tp = 1.86 (see formulas, § 74).
The mean of the various probabilities of drawing a black ball is
Po = f^. According to the Bernoullian scheme we should then
have the following values for the mean and the dispersion:
J/b = 20 X H = 5.25, cr^ = (20 X H X M)* = 1-968.
These values confirm the Poisson theorems (Mp = Ms, <Tp < a^)*
Lexian Series. — As additional illustration of the Lexian Series
Digitized by
Google
144 APPUCATION TO GAMES OF CHANCE. [81
I took 20 sample sets^ each set containing 500 drawings of a
single ball from an urn (with replacements). Hie contents of
the urn varied from set to set as follows:
Sample set No. 1 : 20 wbite and 20 black balk.
2:21 " " 19
3:22 " " 18
«
" " "20:39 " " 1 " "
In the 21st set all the black balls were eliminated and the urn
contained white balls only. This set, however, was not taken in
consideration in calculating the mean and the dispersion.
Lexian Sbbixs. Numbbb (m) of Black Balls in 20 Sample Sets of 500
Indiyidual Drawings (Fisher).
« - 600, AT - 20, Mo - 130.
NcoTStt.
M.
(m - J#o).
{m^M.y»,
1
251
+ 121
14641
2
246
+ 116
13456
3
222
+
92
8464
4
216
+
86
7396
5
193
+
63
3969
6
176
+
46
2116
7
183
+
53
2809
8
173
+
43
1849
9
156
+
26
676
10
135
+
5
25
11
140
+
10
100
12
127
- 3
9
13
115
- 15
225
14
96
- 34
1156
15
78
- 52
2704
16
69
- 61
3721
17
55
- 75
5625
18
43
- 87
7569
19
29
-101
10201
20
19
-111
12321
Sum:
2»
-539 + 661
99012
b = (661 - 539) : 20 = 6.6, 3f = ^o + 6 = 136.6 * 15.86.
ff"« 99012: 20 -6* =4913.4, <r=70.098±11.09 (with mean errors).
The theoretical Lexian values are:
Mz = 131.25, az = 72.676 (see § 74).
Digitized by
Google
81] BXPEKIMENTS BT BONYNGE AND FISHEK. 145
If the series represented a true Bernoullian Series^ we should
have
Jfi, = 500 X H = 131.25, iXB = V500 X H X M = 9-839.
These values confirm the Lexian Theorem {Ml— Mb, <rL><T^.
A computation of the Charlier CoeflGicient of Disturbancy from
the observed values gives :
lOOp = 50.80
whereas the theoretical value is 55.38, showing a decidedly
hypemormal dispersion, a result which was to be expected since
the probabilities of drawing black varies from |^ to |^ in the
various sets of samples.
All the above experiments show a completely satisfactory
verification of the various theorems of the previous chapters
and may perhaps serve as a vindication of the followers of
Laplace, who like him hold that an a priori foundation for
probability judgments is indispensable.
11
Digitized by VjOOQ IC
CHAPTER XII.
OONTINTJATION OF THE APPUCATION OF THE THEORY OF
PROBABILITIES TO HOMOGRADE STATISTICAL SERIES.
82. General Remarks. — ^In this chapter it is our intention to
discuss the application of the theory of probabilities to homograde
statistical series with special reference to vital statistics. We
owe the reader an apology, however, inasmuch as in the former
paragraphs we have employed the term statistics without defining
its meaning in a rigorous manner. A definition may perhaps
appear superfluous since statistics nowadays is almost a house-
hold word. The term unfortunately is often employed as a mere
phrase without any understanding of its real meaning. This
applies especially to that band of self-styled statisticians, mere
dilettanti, who, with an energy which undoubtedly could be
better employed otherwise, attempt to investigate and analyze
mass phenomena regardless of method and system. When
investigations are undertaken by such dilettanti the common
gibe that "statistics will prove anything*' becomes, alas, only
too true and proves at least that "like other mathematical tools
they can be wielded effectively only by those who have taken the
trouble to understand the way they work.'*^
By the science of statistics we understand the recording and
svbseqaerd quantitative analysis of observed mass phenomena.
By mathematical statistics (also called statistical methods) we
understand the quantitative determination and measurement of the
effect of a complex of causes adding on the object under investigation
as furnished by previously recorded observations as to certain aUri^
butes among a collective body of individual objects.
Practical statistics — ^if such a name may be used — ^then simply
becomes the mechanical collection of statistical data, i. e., the
recording of the observed attributes of each individual. In no
way do we wish to underestimate the importance of this process
1 See Nium, "Exercises in Algebra" (London, 1914), pages 432-33.
146
Digitized by
Google
83] STATISTICAL DATA AND MATHEMATICAL PKOBABILITIES. 147
which is as important for the statistical analysis as is the gathering
of structural materials for the erection of a large building.
Mathematical statistics is thus the tool we must use in the final
analysis of the statistical data. It is a very effective and powerful
tool when used properly by the investigator. At the same time
it is not an automatic calculating machine in which we need only
put the material and read off the result on a dial. A person
without any knowledge whatsoever about the nature of loga-
rithms may in a few hours be taught how to use a logarithmic
table in practical computations, but it would be foolish to view
the formulas and criteria from probabilities when applied to
statistical data in the same light as a table of logarithms in cal-
culating work. Such formulas and criteria must be used with
caution and discretion and only by those who have taken the
trouble to make a thorough study of probabilities and master
their real meaning and their relation to mass phenomena. If
put in the hands of mere amateurs the formulas become as
dangerous a toy as a razor to a child.
It is not our intention to give in this work a description of the
technique of the collection of the miEiterial, which depends to a
large extent on local social conditions and for which it is diflScult
to give a set of fixed rules. In the following we shall treat the
mathematical methods of statistics exclusively, and furthermore
make the theory of probabilities the basis of our investigations.
83. Analogy . between Statistical Data and Mathematical
Probabilities. — ^Let us for the moment imagine a closed commun-
ity with a stationary population from year to year and let us
denote the size of such a population by s. Let us furthermore
suppose we were given a series of numbers:
TTll, 77l2> ^8> • • • WjV"*
denoting the number of children born in various years in this
community. The ratios
Ml 7Jfi2 fn>z Tfiir
T' T' V' 7
may then be looked upon as probabilities of a childbirth in
various years. As Charlier justly remarks, "such an identi-
fication of a statistical ratio with a mathematical probability is
Digitized by
Google
148 HOMOGRADE STATISTICAL SERIES. [83
at first sight a mere analogy which possibly may have very little
in common with the observed statistical phenomena, but a
doser scrutiny shows the great importance for statistics of such
a view/' If such ratios could be regarded as mathematical
probabilities wherein the various m's were identical to favorable
cases in a total trials, the mean and the dispersion could be de-
termined a priori from the Bernoullian Theorem. The founders
of mathematical statistics regarded the identification of an or-
dinary statistical series with a Bernoullian Series almost as
axiomatic. This view is found even among some leading writers
of the present time. Among others we apparently find this
traditional view by the eminent English actuary, G. King, in his
classic " Text Book." In Chapter II of this well-known standard
actuarial treatise a probability is defined as follows: "If an event
may happen in a ways and fail in jS ways, all these ways being
equally likely, the probability of the happening of the event is
a-^ {a-^ j8)." With this definition as a basis King then de-
duces the elementary formulas of the addition and multiplication
theorems. He then continues: "Passing now to the mortality
table, if there be Ix persons living at age x, and if these l^^^ survive
to age x + n, then the probability that a life aged x will survive
n years is t^^n -5- 4 = nPx- And again "the probability that a
life aged x and a life aged y will both survive n years is nPxXnPy'^^
From the above it would appear that the author unreservedly
assumes a one-to-one correspondence between the 4+n survivors
and "favorable ways" as known from ordinary games of chance
and a similar correspondence between the original Ix persons and
"equally possible cases." A simple consideration will show that
there exists no a priori reason for such a unique correspondence
between ordinary empirical death rates and mathematical proba-
bilities. None of the original fc persons can be considered as
1 Mr. H. Moir in his "Primer of Insiirance" tried to avoid the difficulty by
giving a wholly new definition of "equally likely events/' According to
Moir "events may be said to be 'equally likely' when they recur with regu-
larity in the long run" Apart from the half metaphorical term "in the long
run" Mr. Moir fails to state what he means by the expression "with regu-
larity." If the statement is to be understood as regular repetitions of a certain
event in various sample sets, it is evident that we may obtain a regular recur-
rence of the observed absolute frequencies in a Poisson Series, where — as
we know — ^the events are not equally likely." — ^A.F.
Digitized by
Google
84] COMPABISON AND PROPOKTIONAL FACTORS. 149
being "equally likely" as in the sense of games of chance.
Numerous factors such as heredity, environment, climatic and
economic conditions, etc., play here a vital part in the various
complexes embracing the original t persons.
The belief in an absolute identity of mathematical probabilities
and statistical frequency ratios seems to have originated from
Gauss. The great German mathematician — or rather the
dogmatic faith in his authority as a mathematician — ^proved
thus for a number of years a veritable stumbling block to a
fruitful development of mathematical statistics. Gauss and his
followers maintained that all statistical mass phenomena could
be made to conform with the law of errors as exhibited by the
so-called Gaussian Normal Error Curve. If certain statistical
series exhibited discrepancies they claimed that such deviations
arose from the limited number of observations. The deviations
would become less marked if the number of observed values was
enlarged and would eventually disappear as the nimiber of ob-
servations approached infinity as its ultimate value. The Gaus-
sian dogma held sway despite the fact that the Danish actuary,
Oppermann, and the French mathematicians, Binemaye and
Cournot, have pointed out that several statistical series, despite
all efforts to the contrary offered a persistent defiance to the
Gaussian law. The first real attack on the dogma laid down so
authoritatively by Gauss was deUvered by the French actuary,
Dormay, in certain investigations relating to the French census.
It was, however, first after the appearance of the already men-
tioned brochure by Lexis, "Die Massenerscheinungen, etc.," that
a correct idea was gained about the real nature of statistical
series.
The Lexian theory was expounded in the previous chapters of
this work, and we are therefore ready to enter upon the investi-
gations of a few selected mass observations from the domain of
vital statistics.
84. Number of Comparison and Proportional Factors. — Ini
the mathematical treatment of the Lexian theory of dispersion
we tacitly assumed that the total number of individual trials in
a sample set or the number of comparison, s, remained constant
from set to set. In the observations on games of chance it
Digitized by
Google
150
HOMOGRADE 8TATISTILAL SERIES.
[84
remained in our power to arrange the actual experiments in such
a manner that a would be constant. In actual social statistical
series such simple conditions do not exist. In comparing the
number of births in a country with the total population it is
readily noticed that the population does not remain constant
but varies from year to year. For this reason the various
numbers m denoting the births are not directly comparable with
another. We may, however, easily form a new series of the form:
8 8
8l St
8
Tflt, -■
Sz
mz,
8
wherein the various numbers, mi, mt, mz • • •, corresponding to
the numbers of comparison «i, 8^, 8z, • • * , are reduced to a constant
number of comparison 8. This series is by Charlier called a
ted/uced statistical series. Such a reduction requires, in many
pBOPOBnoNAL Factors roR a Hypothetical Stationart Population in
Sweden and Denmark Equal to 5,000,000 and 2,500,000
Respectiyelt.
Sweden.
Denmark,
Yaw.
Inhftbitanto.
«:«».
Year.
Inhabitants.
«:«»
1876
4,429,713
1.1288
1888
2,143,000
1.1666
1877
4,484,542
1.1150
89
2,161,000
1.1569
1878
4,531,863
1.1033
1890
2,179,000
1.1473
79
4,578,901
1.0919
91
2,195,000
1.1390
1880
4,565,668
1.0952
92
2,210,000
1.1312
81
4,572,245
1.0936
93
2,226,000
1.1230
S2
4,579,115
1.0919
94
2,248,000
1.1121
S3
4,603,595
1.0861
1895
2,276,000
1.0984
84
4,644,448
1.0765
96
2,306,000
1.0841
1886
4,682,769
1.0677
97
2,338,000
1.0694
86
4,717,189
1.0600
98
2,371,000
1.0544
87
4,734,901
1.0560
99
2,403,000
1.0404
88
4,748,257
1.0530
1900
2,432,000
1.0280
89
4,774,409
1.0472
01
2,462,000
1.0154
1890
4,784,981
1.0449
02
2,491,000
1.0036
91
4,802,751
1.0410
03
2,519,000
0.9925
92
4,806,865
1.0402
04
2,546,000
0.9819
93
4,824,150
1.0365
1905
2,574,000
0.9713
94
4,873,183
1.0261
06
2,603,000
0.9604
1895
4,919,260
1.0165
07
2,635,000
0.9488
96
4,962,568
1.0076
08
2,668,000
0.9370
97
5,009,632
0.9981
09
2,702,000
0.9252
98
5,062,918
0.9875
1910
2,737,000
0.9134
1899
5,097,402
0.9809
11
2,800,000
0.8929
1900
5,136,441
0.9734
1912
2,830,000
Digitized by Vj(
0.8834
30Qle
85] CHILD BIKTHS IN SWEDEN. 151
cases, a certain correction. However, when the general ratios
5 -7- *ik (fc = 1, 2, 3 • • • iV) are close to unity the reduced series
may be treated as a directly observed series. In most of the
following examples taken from Scandinavian statistical tabular
works the propoHional factor s -i- 8k,is close to unity as shown in
the table below. For Sweden I have, following Charlier, assumed
a stationary population s = 5,000,000. The corresponding
Danish s I have taken as 2,500,000.
The above figures are taken from " Sveriges officielle statistik ''
and "Statistisk Aarbog for Danmark'' for 1913 (Precis de
Statistique, 1913).
85. Child Births in Sweden.— From Charlier's "Grunddragen''
I select the following example showing the nimiber of children
born in Sweden in the period from 1881-1900 as reduced to a
stationary population of 5,000,000.
NUMBEB OF CmiiDREN BORN IN SwEDEN AS TO CALENDAR YeAR (ChABUBB).
8 = 5,000,000, i\r = 20, Mo = 140,000.
Year.
m.
m-Jfo.
(m-Jfo)«.
1881
145,230
+5,230
27,352,900
82
146,630
+6,630
44,089,600
83
144,320
+4,320
18,662,400
84
149,360
+9,360
87,609,600
1885
146,600
+6,600
43,560,000
86
148,270
+8,270
68,392,900
87
148,020
+8,020
64,320,400
88
143,680
+3,680
13,542,400
89
138,300
-1,700
2,890,000
1890
139,600
- 400
160,000
91
141,070
+1,070
1,144,900
92
134,830
-6,170
26,728,900
93
136,540
-3,460
11,971,600
94
134,840
-5,160
26,625,600
1895
136,820 .
-3,180
10,112,400
96
135,330
-4,670
21,808,900
97
132,750
-7,250
52,562,500
98
134,820
-5,180
26,832,400
99
131,320
-8,680
75,342,400
1900
134,460
-5,540
30,691,600
SumZ »
+ 53,190 -
-50,390
654,401,400
From which
we obtain:
6 = (+ 53,190 - 50,390) : 20 =
140
Jf = Jf + i
^ = 140,140
Digitized by
Google
152 HOMOGaiU>E STATISTICAL SEBIES. [86
O* = 654,401,400 : 20 « 6* = 32,700,470, or cr = 5,718.
Hie empiricai probability of a birth (po) is
p^, = Jlf : , = 0.02803, so that go = 1 - po = 0.97197 and the
Bernoullian dispersion
(Tb = ^spoqo = 369.0.
The actual observed dispersion (5,718) is thus much greater
than the Bernoullian. The birth series is considerably hyper-
normal. The Lexian ratio has the value
L = 5,718 : 369.0 = 15.50,
while the Charher coefficient of disturbancy is:
lOOp = 4.07.
Both the values of L and p show that the birth series by no
means can be compared with the ordinary games of chance but
is subject to outward perturbing influences.
86. Child Births in Denmark.— The following example shows
the corresponding birth series for Denmark in the 25-year period
from 1888-1912 as reduced to a stationary population of 2,500,000.
The computation of the various parameters follows:
6 = (39,713 - 30,287) : 25 = + 377, ^
JIf = Mo + 6 = 73,377,
a^ = 281,208,156 : 25 - 6« ^ 11,106,197.2,
(Tb^ = «2>o go = 71,223. (po = M :> = 0.0293508),
L = a : as = 12.5
lOOp = 100(V(r2-cr^2) . ^ == 452.
NXTMBEB OF CmLDREN BORN IN DENMARK AS TO CALENDAR YeAR.
8 = 2,500,000, N = 25, Mo = 73,000.
m-Afo. (m-itfo)«.
+ 5,659 32,024,281
+ 4,956 24,661,936
+ 3,154 9,947,716 .
+ 4,377 19,158,129
+ 1,059 1,121,481
+ 3,966 15,721,226
+ 2,956 8,740,636
+ 2,649 7,017,201
+ 3,183 10,131,489
+ 1,404 1,971,216
Year.
m.
1888
78,659
89
77,956
1890
76,154
91
77,377
92
74,069
93
76,965
94
76,956
1895
75,649
96
76,183
97
74,404
Digitized by
Google
86] CmiiD BIRTHS IN DENMARK. 153
Taw.
m.
m-Jfo.
(«-Jfa)»
08
75,570
+ 2,570
6,604,900
99
74,236
+ 1,236
1,527,606
1900
74,146
+ 1,146
1,313,316
01
74,341
+ 1,341
1,798,281
02
73,058
+ 58
3,364
03
71,802
- 1,198
1,435,204
04
72,359
- 641
410,881
1905
70,981
- 2,019
4,076,361
06
71,280
- 1,720
2,958,400
07
70,516
- 2,484
6,170,256
08
71,438
- 1,567
2,455,489
09
79,697
- 2,403
5,774,409
1910
68,777
- 4,223
17,833,729
11
66,016
- 6,984
48,776,256
1912
65,952
- 7,048
49,674,304
Sum: 2 - -30,287 +39,713 281,208,156
Practically the same deductions hold true for this Danish
series as for the Swedish series. We meet again a hypernormal
series subject to perturbing influences. The closeness of the
two values of the Charlier coefficient of disturbancy indicates
that the number of births in Sweden and Denmark apparently
are subject to the same outward disturbing influences.
87. Danish Marriage Series. — ^The following table shows the
number of marriages in Denmark from 1888-1912.
(«-Jfo)«.
156,025
142,884
670,761
966,289
976,144
104,976
308,025
69,696
57,121
456,976
756,900
436,921
1,030,225
16,900
82,944
43,681
11,025
2,809
Number of Marriages in Denmark.
8 - 2,500,000,
i\r =
. 25, Mo =- 18,000.
Year.
m.
m
-Jfo.
1888
17,605
—
395
89
17,622
—
378
1890
17,181
—
819
91
17,017
—
983
92
17,012
—
988
93
17,676
—
324
94
17,445
—
555
1895
17,736
—
264
96
18,239
+ 239
97
18,676
+ 676
98
18,870
+ 870
99
18,661
+ 661
1900
19,015
+1,015
01
17,870
—
130
02
17,712
—
288
03
17,791
—
209
04
17,895
—
105
1905
17,947
-
53
Digitized by
Google
154 HOMOGRADE STATISTICAL SERIES. [87
Taw.
m.
••-Ifo.
(••-Ifa)«.
06
18,592
+ 592
350,464
07
19,072
+1,072
1,149,184
08
18,750
+ 750
562,500
09
18,453
+ 453
205,209
1910
18,256
+ 255
65,026
11
17,749
- 251
63,001
1912
18,034
+ 34
1,156
Sum: 2 ->
-6,742
+6,617
8,686,841
ice we
have:
b = (6,6i7 - 5,742) : 25 = 35, ilf = Jfo + & = 18,035.
a* = (8,686,841 : 25) - 6* = 346,249, <r = 588.43,
aB = 133.81, L = 4.41, lOOp = 5.73.
We encounter again a hypernonnal series with quite large
perturbations. For Sweden Charlier has computed the coef-
ficient of disturbancy for marriages in the i)eriod 1876-1900 and
found it to be 5.49. A comparison with the same quantity for
the above Danish data shows that the perturbing influences
for the two countries are about the same.
88. Stillbirths. — ^As another example from vital statistics I
give the number of stillbirths in Denmark from 1888-1912 as
compared with a hypothetical number of 70,000 births per annunu
NUMBEB OF StILLBIBTHB IN DSNMABK AS RlBDUCBD TO A STATIONARY NXTMBEB
OF 70,000 BiBTHS FEB AnNTTM.
8 - 70,000, i\r » 25, Mo - 1,700.
Taw.
m.
«-
Jfo.
(m-lf.)«.
1888
1,861
+
161
25,921
89
1,924
+
224
50,176
1890
1,830
+
130
16,900
91
1,779
+
79
6,241
92
1,811
+
111
12,321
93
1,788
+
88
7,744
94
1,719
+
19
361
1895
1,753
+
53
2,809
96
1,714
+
14
196
97
1,811
+
111
12,321
98
1,797
+
97
9,409
99
1,737
+
37
1,369
1900
1,696
- 4
16
01
1,732
+
32
1,024
02
1,694
- 6
36
03
1,685
- 15
225
04
1,682
- 18
324
Digitized by VjOO
Qle
89] COAL MINE FATALITIES. 155
Tear. m. Jf-mo. (m— Jf©)*.
1905 1,705 +5 25
06 1,620 - 80 6,400
07 1,723 + 23 629
08 1,694 - 6 36
09 1,665 - 35 1,225
1910 1,658 - 42 1,764
11 1,659 - 42 1,764
12 1,638 - 62 3,844
Sum: S = -310 +1,184 161,216
Actual computation gives:
6 = (1,184 - 310) : 25 = 34.96, M = 1,734.96,
0^ = 161,216 : 25 - 62 = 5,226.44, lOOp = 3.407.
The series is again hypernormal. We shall show presently,
when discussing the disturbing influences, that this series after
the elimination of the secular perturbations actually represents a
normal series. In the meantime we give a few examples relating
to accident statistics.
89. Coal Mine Fatalities. — ^The following table gives the
number of deaths from accidents in coal mines in various countries
in the period 1901-1910 together with the nimiber of compari-
son s.
United
Year Belgium Austria England France Germany Japan States
« - 140.000 « - 68,000 « - 900,000 « « 180,000 « « 500,000 « - 110,000 « - 610,000
1901 164 81 1,224 218 1,170 263 1,982
02 150 73 1,116 196 995 188 2,263
03 160 50 1,134 184 960 278 1,952
04 130 62 1,116 193 900 239 2,135
1905 127 99 1,215 187 930 354 2,214
06 133 70 1,161 1,262 985 578 2,944
07 IM 73 1,179 198 1,240 399 2,977
08 150 58 1,188 171 1,355 262 2,220
09 133 73 1,287 210 1,021 667 2,440
1910 133 63 1,530 194 985 245 2,391
This gives the following values for the Charlier coefficient:
lOOp
Belgium 2.55
Austria 13.85
England 4.71
France 34.19
Gennany 9.27
Japan 44.12*
U.S. A 12.07
* I doubt whether the Japanese data as given by the Bureau of Mines are
reliable.
Digitized by
Google
156 HOMOQBADE STATISTICAL SERIES. [89
The comparatively large values of p show that the fatal ac-
cidents in coal mines are subject to violent perturbations. The
disturbing influences are greatest for France where the Charlier
coefficient is above 34, which immediately shows that some
powerful disturbing influence has made itself felt. Looking over
the table we find a very large number of deaths for the year 1906.
The extremely heavy death rate in this year was caused by the
Courrieres mine explosion, in which 1,099 persons lost their lives
and marks probably the most fatal disaster in the whole history
of coal-mining. Eliminating this catastrophe from the data in
the table given above we find indeed that the coefficient of dis-
turbancy becomes imaginary, indicating very stable conditions
in French mines. Thus eliminating the more fatal catastrophes
we get at least for France a subnormal series for the everyday
accidents. In order better to illustrate the influence of the
elimination of the most disturbing catastrophes I submit the
following two series as reduced to a stationary s = 630,000 of
fatal coal mine accidents in the United States in the period
1900-1914 as recorded by the Bureau of Mines. The first series
shows total number of deaths m*, the second series gives the total
deaths rrik per year after eliminating all such accidents in which
5 or more men were killed.
Number of Deaths from Accidemts in Coal Mines in United States.
8 =
630,000, N «
15.
mk
m*'
«*
m'
1900
2,173
1,843
1908
2,293
1,967
01
2,048
1,863
09
2,520
2,053
02
2,337
1,837
1910
2,470
2,085
03
2,016
1,768
11
2,350
1,984
04
2,205
1,911
12
2,060
1,839
1905
2,286
1,964
13
2,350
1,957
06 2,111 2,075 1914 2,070 1,810
07 3,074 2,190
The first series gives a coefficient of disturbancy equal to 11.06
while the same quantity for the second series has the value 5.51.
Despite the fact that the coefficient of disturbancy is reduced
about 50 per cent, there still remains disturbing influences, which
clearly shows that conditions in American mines are not so stable
as in the mines of France, Belgium and England.
Digitized by
Google
90] REDUCED AND WEIGHTED SERIES IN STATISTICS. 157
90. Reduced and Weighted Series in Statistics. — So far all
our problems in statistical analysis have been related to series
where the value of s was constant or where the ratio s : Sk was
so dose to unity that it might be used as a factor of propor-
tionality. We shall now consider the case where this ratio differs
greatly from unity. As an illustration of this kind of series I
choose the number of fatal coal mine accidents in various states
of the American Federation together with the number of people
engaged in coal mining in these states. The figures as taken from
the report of the Bureau of Mines relate to the year of 1914.^
Number of Pbbsons Engaged in Mining (st) and Nxtmbbr Killed
(m*) IN 20 States Dubing the Yeab 1914.
« = 1000. iV = 20.
•k. m*. posik. |»»-i»o»*I-
1 Alabama 24,552 128 73 65
2 Colorado 10,560 75 31 44
3 Illinois 79,529 141 237 96
4 Indiana 22,110 44 66 22
5 Iowa 15,757 37 47 10
6 Kansas 12,600 33 37 4
7 Kentucky. ........ 26,332 61 79 18
8 Maryland 5,675 18 17 1
9 Missouri 10,418 19 31 12
10 New Mexico 4,021 18 12 6
11 Ohio 45,816 62 136 74
12 Oklahoma 8,948 31 27 24
13 Pennsylvania 176,746 696 624 71 (Anthracite Mines)
14 Pennsylvania 172,196 402 513 111 (Bituminous Mines)
15 Tennessee 9,680 26 29 3
16 Texas 4,900 11 16 4
17 rurginia 9,162 27 27
18 Washington 6,730 17 17
19 W. Virginia. 74,786 371 223 148
20 Wyoming 8,363 61 26 26
Sum: Z - 726,659 2,167 709
It will be noted that the population engaged in mining varies
greatly from state to state. In making a simple reduction to a
common number of comparisons by a proportional factor it is
evident, however, that we would give the same weight to the
observed from New Mexico with a population of miners equal to
^ Catastrophes in the Eccles Mine in West Virginia and in the Royalton
Mine of Illinois are eliminated.
Digitized by
Google
158 HOMOGRADE STATISTICAL SERIES. [90
4,021 as to the mining population of the state of Pennsylvania
where over 340,000 persons are engaged in the same industry.
This procedure is faulty. Let us imagine for the moment two sets
of drawings from a bag containing white and black balls. The
first sample set contained 10,000 drawings and the second set
only 100 drawings. K these series were reduced to a common
number of comparison s = 1,000 we should have
IQQOQ ^^ ^^^ "lOiT^* ^^^ ^^^ ^ standing for the number
of white balls) as the number of white balls drawn in sample sets
of 1,000 single drawings.
But these values are not equally reliable. The mean error in
the second series is in fact 10 times as large as the mean error in
the first series. In order to overcome this diflBculty we ask the
reader to consider the following series:
The element — mi is repeated *i times
^8
In this way we obtain a series with *i + *2 + ^s + • • • + %
elements which may be termed a reduced and weighted series
since the larger s^ appears oftener than the smaller values of **.
We shall now see if it is possible to determine the expected value
of the mean and the dispersion if the series is supposed to follow
the BernouUian Law.
The mean is defined by the following relation:
Si > < ^^2 >
I 1 * 1 * 1 I *
Wit -t---mi-t---m2+ •••-t' — ma
*1 S2 S2
Digitized by
Google
90] REDUCED AND WEIGHTED SERIES IN STATISTICS. 159
< 9 jf ^
o St
+ ••• +--m^H h — miyr M- [*i + *2+ hs^]
Denoting the average empirical probability by po we have
Zmjb : ^Sk = Po and;
Ms = spo>
As to the dispersion it takes on the following form:
..[(p;:y7:.T(r:::5-
« ^»2 >
+ ljmi — «Po) + ■•' +( -wis — »Pbj+ •••
« *« >
+ (^m^ - spoji- • • • + (^m^ - ^po)* ]
( S** - mjb — *Pd ) S -- (mk—SkPoy
= - — %: - = -^^^^. (ft = i,2,3,
Z^jb Z^ib
iV).
In finding the theoretical dispersion, assuming a Bernoullian
distribution for which po may be used an an approximation of
the mathematical a priori probability, we ask the reader to
examine the general term of the expression for o^, viz. :
- (m* — SkPoY ' 2^*.
If the individual trials follow the Bernoullian Law the expected
value of the factor (nik — SkPo)^ takes the form:
e[(mk — SkPo)^ = S(mib — SkPoYfpimk) = SkPoQo.
This brings the general term for o^ to the form:
Digitized by
Google
160 HOMOGBADE STATISTICAL SERIES. [90
Thus the expected value of a^ according to the Beruoullian
dbtribution may be written as follows:
where as before pb = Smjb : S^jk and / = ^-
These formulas give us the means of computing the Lexian
Ratio and the Charlier coefficient of disturbancy in the ordinary
way. Some of the computations require, however, a great
amount of arithmetical work and the goal is reached more
easily by making use of the mean deviation (in § 74a).
We found there the following relation:
Q = 1.2533t>.
In the weighted series it is readily seen that the value of ^
will be of the form:
2«*
8
Xs\ rrik ■" SkPo I
2** 2**
If the series may be assumed to follow a Bernoullian distri-
bution we have
(Tg = 1.2533t>.
From the above formulas it is readily noticed that we may find
the mean and the dispersion directly from the observed series
without a preliminary reduction to a common number of com-
parison 8. This is in fact the method used in the above example
of coal-mine accidents in various states. We have:
po = 2mjfc : 2** = 2,167 : 726,659 = .002982,
tf = ^^l^*""^*P»l = 1,000 X 709 : 726,659 = 0.9757,
(T = 1.2533 X t^ = 1.223,
90 000
,^ 100V(r2-cr/ ._
lOOp = -T^ = 40 approx.
Digitized by
Google
91] SECX7LAB AND PERIODICAL FLUCTUATIONS. 161
The large value of the CharKer coefficient of disturbancy
clearly shows that conditions in coal mines by no means are
uniform in the whole union but vary greatly according to the
locality. An actual computation shows in fact that in a few
states such as Michigan and Iowa we find an imaginary coeffi-
cient of disturbancy whereas States as Ohio and West Virginia
exhibit marked hypernormal series with a large coefficient of
disturbancy. The establishment of this fact is of some im-
portance in connection with accident assurance. Many sta-
tisticians seem to be of the opinion that a standard accident table
computed from the data of the whole union ought to serve as
the basis for assurance premiums. Such a table would assume
uniform conditions all over the unioii. The enormously high
value of p as computed above shows the fallacy of such a view.
91. Secular and Periodical Fluctuations. — In the last para-
graphs we have just learned how to detect the presence of dis-
turbing influences in a statistical series. A value of the Lexian
ratio differing from unity or a value of the Charlier coefficient of
disturbancy differing from zero indicates the presence of fluc-
tuations in the chances for the event or phenomena under in-
vestigation. After having established the presence of such
fluctuations it is the duty of the statistician to trace the sources
of the disturbing influences. This is in general done by means of
the theory of correlation, which will be discussed in the second
volume of this work.
It is, however, possible to classify the disturbances under two
categories which by Charlier are termed as secular and periodical
variations.^ The periodical fluctuations are in general difficult
to discuss on account of the variations in the period of the dis-
turbing forces. In many cases we are in absolute ignorance
about the length of such a period and therefore unable to subject
the series to a mathematical analysis. If the length of the period
is known it is indeed not difficult to determine the periodical
disturbances. This is often the case in series giving the occur-
rence of a certain disease in various months. In statistics giving:
the frequency of malaria in a community the observed cases are
1 LexiB uses the terms "evolutionary" ("symptomatic") and "periodical'*
("oecilkting") for such fluctuations.
12
Digitized by VjOOQ IC
162 HOMOGRADB BTAIISTICAL 8SBIB8. [91
nearly all limited to the warmer months and infrequent in the
winter months.
In the secular fluctuations due to certain outward influences
working continually in the same direction it is quite easy to
calculate tl^e rate of such variations.
Let /3 denote the increase (decrease) of the original probabilities
(Pu Pif Pt9 '" Pn) from set to set in the given statistical series
so that
Pt — Pi = /9
Ps — Pt = i8
Pn — Ph-i = P
We then have:
P* = Pi+(fc-l)/3. (1)
The mean probability has the value:
Pi + p» + p» + • " + Pjy
i^= N
Px + Pi + i8+Pi + 2i8+>..+Px+(iy-l)|8
N
(2)
= pi + — 9— i5-
2
lEliminating pi from (1) and (2) we have:
TS the observed and reduced numbers mi, mt, mz, •" mjf may be
jegarded as approximately coinciding with spi, spz, tfpa, • • • sp^^
we may write (2) as follows:
m*-M=(fc--^^^^)*j8 (A:=l,2,3, ..iV). (3)
In order to obtain an expression for sfi in known quantities we
must climate the quantity k. Multiplying both sides of the
equation (3) hyk-{N+ l)/2 we have:
(m* --myk — Y") = V* — Y^) "^P-
Digitized by VjOOQ IC
91] SECX7LAB AND PERIODICAL FLUCTUATIONS. 163
Summing this expression for all values k from k^ Itok^^ N
we have:
s(*-^^^)(mt-3f) = *^2(i-^^y. (4)
The following expressions from the summation of series are
well known to the reader from elementary algebra:
i:ife»=Ji\r(i\r+i)(2i\r+i),
Substituting these values in (4) we obtain after a few simple
transformations the following expression for sfi:
sp =
■i\r(iv«- 1)^
.yic- 2 ;^^
* — iia;.
w
The Seculab Annual Decbbase of Numbeb of Stillbibths m
Dbnmabk
8 - 70,000, N
- 25, Af = 1,735
Ytor.
ft.
m».
m*-lf. *" 2 •
(»_^+i)(«..«),
1888
1
1,861
+126 -12
—
1,512
89
2
1,924
+189 -11
—
2,079
1890
3
1,830
+105 -10
—
1,050
91
4
1,779
+ 44 - 9
—
396
92
5
1,811
+ 76 - 8
—
808
93
6
1,788
+ 53 - 7
—
371
94
7
1,719
- 16 - 6
+
96
1895
8
1,763
+ 18 - 6
—
90
96
9
1,714
- 21 - 4
+
84
97
10
1,811
+ 76 - 3
—
228
98
11
1,797
+ 62 - 2
—
124
99
12
1,737
+ 2 - 1
—
2
1900
13
1,696
- 39
01
14
1,732
- 3 +1
—
3
02
15
1,694
- 41 +2
—
82
03
16
1,685
- 50 +3
—
150
04
17
1,682
- 53 +4
—
212
1905
18
1,705
- 30 +5
—
150
06
19
1,602
-115 + 6
—
690
07
20
1,723
- 12 +7
—
84
06
21
1,694
- 41 +8
—
328
09
22
1,665
- 70 +9
—
630
1910
23
. 1,658
- 77 +10
—
770
11
24
1,658
- 77 +11
—
847
1912
25
1,638
- 97 +12
—
1,164
Sum:
-11,590
Digitized by'
GooqIc
164 HOMOGRADB STATISTICAL 8EBIES. [91
ÅS an example illustrating secular fluctuations I take the
previously discussed series of stillbirths in Denmark.
We have in this case
hence:
*/9 = - 11,590 : 1,300 = - 8.92.
From this we may draw the conclusion that the number of still-
births in Denmark pr. 70,000 births per annum on the average
is decreased by 8.92.
If the fluctuations are of an essential secular character we may
write
m = Jf+(jfc- 13)(-8.92)
as the number of stillbirths pr. annum. Apart from accidental
fluctuations due to sampling we should therefore obtain a nearly
normal series for the 25-year period if we calculated the number
of stillbirths each year according to the expression: m*
— (i — 13) (— 8.92). Such a computation is given below:
NuMBBB OF StUiLbibths m Denmark Freed from Secular Fluctuations.
8 =
70,000,
AT = 25.
Tmt.
*. m-
-(t-18)(-
8.92).
Year.
*. m*-
-(*-13)(.
1888
1
1,754
1900
13
1,696
89
2
1,826
01
14
1,741
1890
3
1,741
02
15
1,712
91
4
1,699
03
.16
1,712
92
5
1,740
04
17
1,718
93
6
1,726
1905
18
1,730
94
7
1,666
06
19
1,675
1895
8
1,708
07
20
1,875
96
9
1,678
08
21
1,765
97
10
1,784
09
22
1,745
98
11
1,779
1910
23
1,747
99
12
1,728
11
1912
24
25
1,756
1,745
A computation of the characteristics of this series gives:
3f •= 1,735, c = 37.09, (Tb = 41.6, lOOp imaginary.
The dispersion is now slightly subnormal and the coefficient
of disturbancy is imaginary whereas in the original series in
§ 88 it had a value equal to 3.4.
Digitized by
Google
92] CANCEB STATISTICS. 165
92. Cancer Statistics. — Mr. F. L. Hoffman in his treatise ^'The
Mortality from Cancer Throughout the World '* gives some very
interesting statistics on mortality from cancer in various localities.
Through the kindness of Mr. Hoffman I am able to submit the
following series relating to cancer among males in the City of
New York (Manhattan and Bronx Boroughs) :
Deaths vbom Canceb (rnjb) in the Citt of New Yobk as Reduced to a
Stationary Population of 1,000,000.
8 = 1,000,000, AT = 25, Af = 560.
Y«tf.
ft.
m*.
m*-Jf.
.-^. (.-
^+^)(«*-10"
1889
1
377
-183
-12
2,196
1890
2
476
- 84
-11
924
91
3
410
-150
-10
1,500
92
4
444
-116
- 9
1,044
93
5
462
- 98
- 8
784
94
6
423
-137
- 7
959
1895
7
442
-118
- 6
708
96
8
493
- 67
- 6
335
97
9
605
- 55
- 4
220
98
10
515
- 45
- 3
135
99
11
513
- 47
- 2
94
1900
12
547
- 13
- 1
13
01
13
595
+ 35
•
02
14
540
- 20
+ 1
-20
03
15
580
+ 20
+ 2
40
04
16
609
+ 49
+ 3
147
1905
17
639
+ 79
+ 4
316
06
18
619
+ 59
+ 6
295
07
19
658
+ 98
+ 6
588
08
20
631
+ 71
+ 7
497
09
21
683
+123
+ 8
984
1910
22
710
+150
+ 9
1,350
11
23
710
+150
+10
1,500
12
24
721
+161
+11
1,771
1913
25
718
+158
+12
1,896
Sum: 2J - 18,276
A computation of the dispersion and the Charlier coefficient
of disturbancy gives a value of lOOp in the neighborhood of 18,
indicating marked fluctuations. An inspection of the series shows
immediately that there is a marked increase in the rate of death
from cancer. Working out the secular disturbances in the ordi-
Digitized by
Google
166 HOMOGRADB 8TATI8TZCAL SERIES. [93
nary manner we find:
indicating an increase of death from cancer of about 14 persons
pr. annum for a population of 1,000,000. Eliminating the secular
disturbances in the same manner as above, we now get a coefficient
of dbturbancy equal to 0.983t (t = V — 1), practically a normal
dispersion when taking into account the mean error due to
sampling.
93. i^lication of the Lezian Dispersion Theory in Actuarial
Theory. Conclusion. — ^The Russian actuary, Jastremsky, has
applied the Lexian Dispersion Theory in testing the influence of
medical selection in life assurance.^ The research by Jastremsky
evolves about the following question. Is medical selection a
phenomenK^ independent of the age of the assured? Let ^'^g,
denote the observed rate of mortality after / years' duration of
assurance. In the same manner q^^^ denotes the rate of mor-
tality of a life aged z after 5 or more years of duration (/ ^ 5).
Forming the ratio ^'^g* : g,^^^ for all ages of x we obtain a certain
homograde series for which we may compute the Lexian Ratio
and the Charlier Coefficient and thus determine if the fluctuations
are due to sampling only or dependent on the age of the assured.
Space does not allow us to give a detailed account of the very
interesting research by Jastremsky as applied to the Austro-
Hungarian Mortality Table (Vienna, 1909), and we shall limit
ourselves to quote his final results as to the Lexian Ratio, L,
for Whole life Assurances and Endowment Assurances:
Whole life AaBuranoee.
1
L
L
1
0.88
1.01
2
0.89
0.96
3
1.12
1.05
4
1.05
0.98
5
1.07
0.91
The above values of L all lie close to unity and the series may
therefore be considered as a Bernoullian Series where the fluctu-
* Jastremsky: "Der Auslese-Koefl5zient," Zeitachr. /. d. gea. 7er».-TFiM.,
Band XII, 1912.
Digitized by
Google
93] APPLICATION OP THE LEXIAN DISPERSION THEORY. 167
ations are due to sampling entirely. Or in other words, the ratio
<Pt = ^'^9* : 9«^^^ is a quantity independent of the age of the
assured.
The great majority of statistical series may be subjected to a
similar analysis as given in the preceding chapters. The char-
acteristics as described previously, the Lexian Ratio and the
coeflScient of disturbancy, tell us the magnitude of possible fluc-
tuations from sample to sample. In many cases we may by means
of the secular coeflSicient of disturbancy, j8, partly or wholly
eliminate such fluctuations, due to secular causes, and thus be in
a better position to study the periodical fluctuations.
A statistical research may be likened to the navigation of a
difficult waterway, full of hidden rocks and skerries out of sight
to the navigator. The amateur statistician, sailing the ocean
in a blind and happy-go-lucky manner, often comes to grief on
those rocks and suffers a total shipwreck. The skillful navigator,
the mathematically trained statistician, is always on the lookout
for the sea marks. In the Lexian Ratio and the Charlier Coef-
ficient of Disturbancy he recognizes a beacon light, often signal-
ling '' Danger ahead.'' He stops his engines. In case he does
not possess the particular charts giving the exact location of the
hidden reefs his prudence advises him to call a pilot to bring his
ship safely in harbor. On the other hand, if he has reliable
charts and knows his profession thoroughly he may venture
forth and do his own piloting, by a study of the charts. It is
to the study of such charts — i. e., a special study of the higher
statistical characteristics — ^that we shall turn our attention in
the second part of this treatise. The reader who has followed us
up to this point may perhaps feel discouraged by realizing how
little he has gained in knowledge after having learned a mass of
technical detail and formulas. We can quite appreciate and
understand this feeling. So far, he has perhaps chiefly been
impressed by the treacherous and misleading character of sta-
tistical mass phenomena, but to recognize a danger signal and
thus avoid the pitfalls is one of the fundamental essentials in
safe navigation in statistical research.
Digitized by
Google
ADDENDA.
AFFENDDC AND BIBLIOGRAPHICAL NOTES.
Chaftbb I.
Page 3. The establishment of the relations between hypothetical judg-
ments and probabilities is probably first due to F. C. Langb. See also the
discussion in Siqwabt's ''Logic" (English translation, Macmillan Co., New
York, 1904). A defense of the ''principle of insufficient reason" as opposed
to the view of von Kries is given by K. Stumff ("tlber den Begriff der mate-
matischen Wahrschdnlichkat") Ber. hayr, Ak. (phil. Kl,), 1892. For a
further discussion of the philosophical aspect the reader is advised to consult
"Theorie und Methoden der Statistik" (Tubingen, 1913) by the Russian
statistician, A. Kaufmann.
Chaftbb II.
Page 21. An interesting account of the application of the theory of proba-
bilities to whist is given by Poole in "Philosophy of Whist Play" (New York
and London, 1883). Page 23. Example 6. This is a general case of the
so-called game of "Trdze" or "Recontre" first discussed by Montmobt in his
"Essai sur les Jeux des Hazards" (1708). "Thirteen cards numbered 1, 2,
3, ... up to 13 are thrown promiscuously into a bag and then drawn out
singly; required the chance that once at least the number on a card shall
coincide with the number expressng the order in which it is drawn." This is
one of the stock problems in probability and has been discussed by nearly all
the leading clasacal writers on the subject.
ChafherIV.
The close connection between probability and symbolic logic is admirably
discussed by the Italian mathematician, Peano, in various of his mathematical
texts. Page 42. Example 19. See also the discussion by R. Hbndebson in
"Mortality and Statistics", (New York, 1915).
Chapter V.
38. The moral expectation has been discussed by Harald Wbstbrgaard
in "Tidsskrift for Matematik" (1878) and m "Smaaskrifter tilegnede C. F.
Krieger" (Copenhagen, 1889).
Chapter VI.
A German translation with explanatory notes of Bates's brochure has
recently appeared in the series of "Ostwald's Klassiker."
Page 74. The double integral in the numerator of (IV) is evidentiy of the
form:
(A)
168
Digitized by
Google
ADDENDA.
169
where the contour of the field of integration (A) is defined by means of the
relations:
a < yiVi < ft < yi < 1 and < 1/2 < 1.
The field of integration is thus the area swept out by the hyperbola
yiVi = a, the straight line yt = 1, the hyp^bola ytyt = ^ and the straight
line yi = 1.
Changing the variables by means of the transformation:
(!J| taken as absolute value).
yiyi = y = ^(y, 2) and 1 — yi
we get the following new double integral
Cf FMy,z\ 4^{y»z)]\J\dydz
where / is the Jacobian or functional determinant defined by the formula:
For
J =
dip dip
dy dz
d±H
dy dz
~ dy dz dz dy'
yi
1 -2
- 1 - 2(1 - y) ~ ^^' *''»
= 1 - 2(1 - y) = ^(y, z),
yiX-y) II _ 1 - y
[1 - z{l .
z
-I/)P[1
- 2(1 - y)]» " 1 - 2(1 -
- (1 - tf)
V)
\A'
The transformation in a double integral implies in general three parts
(1) the expression of Fiyiy^) in terms of y, z(2) the determination of the new
system of limits (3) substitution of dyidyt. The solution of the third part we
just gave above. The solution of the two first is purely algebraically. The
first part is a straightforward simple problem which should present no difficulty
V
y-fl
II
M
""
ivm
%
jti:
v-a
» See Goubsat: " Mathematical Analysis " (New York, 1904) pages 26^-67.
Digitized by
Google
170 ADDENDA.
whatsoever to the student and which m conjunction with (3) brin^ the in-
tegrands on the form given in formula (V).
The easiest way to determine the new system of limits is probably by con-
structing the contour in the new field of integration. The h3nP^^lAS
yiVt " a and yiyt >■ /9 are in the new field of integration changed into the two
straight lines y " a and y ^ fi which determine the limits for the variable y,
A mere inspection of the expressions for ^(y, z) and ^(y, z) shows that the two
straight lines yt « 1 and yi » 1 become in the new field 2; » 1 and z =
which are the limits for z.
The contour (Ai) mmply becomes a rectangle bounded by the straight
lines f"*0yy»/9, f»l and y >» at. The complete transformation finally
brings the numerator on the form as given in (V).
Page 75. The question put by Mr. Bing is simply the determination of a
future event by means of Bayes's Rule. The limits a and /S become and 1
respectively and the contour of the field of integration simply becomes the
area bounded by yiyt " 0,y2 " 1, yiyt » 1 and yi » 1, i. e., the area enclosed
between the two axis, the line yt ^ I, the hyi)erbola yiyt » 1 and the line
yi « 1. The transformed contour becomes a square with side equal to unity.
Chaptbr VIL
Page 83. The criticism by the English empiricists is to a certain extent
due to a misconception of the Bemoullian Theorem. ''This theorem," Venn
says, ''is generally expressed somewhat as follows: That in the long run all
events will tend to occur with a frequency proportional to their objective
probabilities." Any one giving careful attention to the deduction of the famous
theorem will, however, readily notice the fallacy of such a view. Not the
actual absolute frequencies of the events but the mathematical expectationa of
suck events are proportional to the a priori mathematical probability p. The
fallacy of Mr. Venn lies in his confusing an actual event with its mathematical
expectation. In other words, he makes the Bemoullian Theorem appear as
a regular hypothetical judgment whereas as a matter of fact it is a simple
probability judgment. If one is to take such an erroneous view of the Ber-
nouUian Theorem one may even be reconciled with another startling statement
by Venn that "If the chance (against the happening of a certain event) be
1,023 to 1 it undoubtedly will happen once in 1,024 trials."
For a clear presentment of the empirical methods and thdr relation to
mathematical probabilities and deductive methods see v. Bortkiewicz
"Eritische Betrachtungen zur theoretischen Statistik" (Jahrb. f. N.-Oe. u.
Stat. 3 Folge, Ed. 8, 10, 11) and "Die statistischen Generalisationen" {Sd^
erUia, Vol. V). v. Bortkiewicz is but one of the brilliant school of Russian
statisticians who has made a thorough study of the philosophical aspects of
statistics. The induction method of J. S. Mill is carried much farther and
put on a far sounder basis than that originally given by MiQ in the brochure
"Die Statistik als Wissenschaft" by A. A. Tschupboff as well as in his Russian
text " Researches on the Theory of Statistics." The main ideas of the Russian
writers are also found in Eattfmann's "Theorie und Methoden der Statistik"
(TQbingen, 1913).
Digitized by
Google
ADDENDA. 171
Chafteb IX.
Page 96. For a closer approximation of n\ see Pobstth, A. R., "On an
Approximate Expression for x\ " (Brit. Ass. Rep., 1883). Page 107. In this
discussion it must be remembered that the variables are independent of each
other. The formula: €(ka) - A^(a) is self evident, but may be proved aa
follows:
e(ka) = Jtep = *»(«), e[ka - e{ka)? - *»e[a - e(a)P = 1^é{a), or tQca) - *«(«).
Page 115. See also a similar discussion by Westebgaabo in " Mortalit&t und
Morbilitåt" (Jena, 1002), page 187.
ChaftbbXI.
The still unfinished series of monographs by Chabubb are found in various
volumes of Meddekmde från Lunde Astronomiaka Ohaervatarium (Lund,
Stockholm) and in Svenaka Aktuarirfåreningena Ttdaakrift (Stockholm).
Page 137. Since all statistical characteristics to greater or less extent are
effected with mean errors due to sampling it is of importance to be able to
determine such mean errors in simple algebraic terms. We shall for the present
confine ourselves to the mean and the dispersion. The mean error in the mean,
Mb in a BemouUian Series is given by the formula:
nun V^(mi) + é(m%) + ' " ^(mv) VATapcgo «r
*(^" N —JT"^'
The mean error of the dispersion is somewhat difficult to obtain by elementary
methods since it involves the determination of the mean error of the mean error.
The mean error square of the mean error square may be gotten by a process
similar to that of. Laplace in § 65-66 by the introduction of the parameter, t,
in the expression for d* and o^ in el(a ~ ap)* — <P9p. After several reductions
this latter expression may be brought to the form: 2(ap^y » 2a* (approx.).
For the dispersion we have:
é(a) ■■ ■ — .
'^
This formula will be proven under the discussion of frequency curves.
Digitized by
Google
^y
Digitized by
Google
Digitized by
Google
'*:^'
Digitized .by
Google
r
s ^-
- S-
i '
(7
Digitized by
Google